The storage requirements for each of the data types supported by MySQL are listed here by category.
The maximum size of a row in a MyISAM table is
65,534 bytes. Each BLOB and
TEXT column accounts for only five to nine
bytes toward this size.
Important: For tables using the
NDBCluster storage engine, there is the factor
of 4-byte alignment to be taken into
account when calculating storage requirements. This means that all
NDB data storage is done in multiples of 4
bytes. Thus, a column value that — in a table using a
storage engine other than NDB — would
take 15 bytes for storage, requires 16 bytes in an
NDB table. This requirement applies in addition
to any other considerations that are discussed in this section.
For example, in NDBCluster tables, the
TINYINT, SMALLINT,
MEDIUMINT, and INTEGER
(INT) column types each require 4 bytes storage
per record.
In addition, when calculating storage requirements for Cluster
tables, you must remember that every table using the
NDBCluster storage engine requires a primary
key; if no primary key is defined by the user, then a
“hidden” primary key will be created by
NDB. This hidden primary key consumes 31-35
bytes per table record.
You may also find the ndb_size.pl utility to
be useful in such cases. This Perl script connects to a current
MySQL (non-Cluster) database and creates a report on how much
space that database would require if it used the
NDBCluster storage engine. See
Section 15.9.14, “ndb_size.pl — NDBCluster Size Requirement Estimator”, for more
information.
Storage Requirements for Numeric Types
| Data Type | Storage Required |
TINYINT |
1 byte |
SMALLINT |
2 bytes |
MEDIUMINT |
3 bytes |
INT, INTEGER
|
4 bytes |
BIGINT |
8 bytes |
FLOAT( |
4 bytes if 0 <= p <= 24, 8 bytes if 25
<= p <= 53 |
FLOAT |
4 bytes |
DOUBLE [PRECISION], REAL
|
8 bytes |
DECIMAL(,
NUMERIC(
|
Varies; see following discussion |
In MySQL versions up to and including 4.1,
DECIMAL columns are represented as strings and
their storage requirements are:
M+2 bytes, if
D > 0
bytes, if
M+1D = 0
D+2, if
M <
D
Storage Requirements for Date and Time Types
| Data Type | Storage Required |
DATE |
3 bytes |
DATETIME |
8 bytes |
TIMESTAMP |
4 bytes |
TIME |
3 bytes |
YEAR |
1 byte |
Storage Requirements for String Types
| Data Type | Storage Required |
CHAR( |
bytes, 0 <=
255 |
VARCHAR( |
L+1 bytes, where
and 0
<= 255 |
BINARY( |
bytes, 0 <=
255 |
VARBINARY( |
L+1 bytes, where
and 0
<= 255 |
TINYBLOB, TINYTEXT
|
L+1 bytes, where L
< 28
|
BLOB, TEXT
|
L+2 bytes, where L
< 216
|
MEDIUMBLOB, MEDIUMTEXT
|
L+3 bytes, where L
< 224
|
LONGBLOB, LONGTEXT
|
L+4 bytes, where L
< 232
|
ENUM(' |
1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum) |
SET(' |
1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum) |
For the CHAR, VARCHAR, and
TEXT types, L and
M in the preceding table should be
interpreted as number of bytes before MySQL 4.1 and as number of
characters thereafter. Lengths for these types in columns
specifications indicate number of characters from MySQL 4.1 on.
The number of extra bytes for recording lengths for
variable-length data types is unchanged. For example,
L+1 bytes to store a
TINYTEXT value before MySQL 4.1 becomes
L characters + 1 byte to store the
length as of MySQL 4.1.
VARCHAR and the BLOB and
TEXT types are variable-length types. For each,
the storage requirements depend on the actual length of column
values (represented by L in the
preceding table), rather than on the type's maximum possible size.
For example, a VARCHAR(10) column can hold a
string with a maximum length of 10. The actual storage required is
the length of the string (L), plus one
byte to record the length of the string. For the string
'abcd', L is 4 and
the storage requirement is five bytes.
As of MySQL 4.1, to calculate the number of
bytes used to store a particular
CHAR, VARCHAR, or
TEXT column value, you must take into account
the character set used for that column. In particular, when using
the utf8 Unicode character set, you must keep
in mind that not all utf8 characters use the
same number of bytes. For a breakdown of the storage used for
different categories of utf8 characters, see
Section 10.7, “Unicode Support”.
Note: The NDB
Cluster engine supports only fixed-width columns. This
means that a VARCHAR column from a table in a
MySQL Cluster will behave almost as if it were of type
CHAR (except that each record still has one
extra byte overhead). For example, in an NDB
table, each record in a column declared as
VARCHAR(100) will take up 101 bytes for
storage, regardless of the length of the string actually stored in
any given record.
The BLOB and TEXT types
require 1, 2, 3, or 4 bytes to record the length of the column
value, depending on the maximum possible length of the type. See
Section 11.4.3, “The BLOB and TEXT Types”.
TEXT and BLOB columns are
implemented differently in the NDB Cluster
storage engine, wherein each record in a TEXT
column is made up of two separate parts. One of these is of fixed
size (256 bytes), and is actually stored in the original table.
The other consists of any data in excess of 256 bytes, which is
stored in a hidden table. The records in this second table are
always 2,000 bytes long. This means that the size of a
TEXT column is 256 if
size <= 256 (where
size represents the size of the
record); otherwise, the size is 256 +
.
size + (2000 -
(size - 256) % 2000)
The size of an ENUM object is determined by the
number of different enumeration values. One byte is used for
enumerations with up to 255 possible values. Two bytes are used
for enumerations having between 256 and 65,535 possible values.
See Section 11.4.4, “The ENUM Type”.
The size of a SET object is determined by the
number of different set members. If the set size is
N, the object occupies
( bytes,
rounded up to 1, 2, 3, 4, or 8 bytes. A N + 7) / 8SET can
have a maximum of 64 members. See Section 11.4.5, “The SET Type”.

User Comments
I cannot see why a TIMESTAMP type uses 4 bytes where a DATETIME type uses 8. Elsewhere in the Manual, it states that the ranges are identical, and in some cases the variables are treated identically. If this is not a typo, someone should document the reason for the large difference in storage requirements.
My tip is that if the above table is accurate, it is more efficient to use type TIMESTAMP over type DATETIME, saving 4 bytes per use. Even using seperate DATE and TIME columns will save 2 bytes per use.
TIMESTAMP is an integral number of seconds since epoch (Jan 1, 1970). Its range is limited to about 1970-2038 (2**31 seconds). DATETIME has a much larger range: 1000-9999 A.D. I think it's entirely plausible that a MySQL server (or database) will last to 2038 -- our mainframes running COBOL lasted into the third millenium.
This is quite an interesting topic. But then, to save space, you would need the date, and the year columns, which will be 4 bytes long as well. If "time of event" is also needed, add the time column type, and you're left with 7 bytes instead of 8.
But, how far into the future can the date / year column types go? And wouldn't it be possiblem for the server developers to update the timestamp column type?
Unix Timestamp columns are actually more space-efficient than regular columns. Right now, MySQL stores TIMESTAMPs with 4 bytes, which is pretty much equivalent to a signed integer. This gives you a range of roughly 2^32 / 2 seconds, which is equal to the 68 years that a MySQL timestamp spans. As the need for post-2038 dates increases, it will probably be pretty easy in the MySQL code to turn TIMESTAMP columns into the equivalent of a unsigned integer, which will last into the year 2106. Applying this concept even further, if MySQL were to store timestamps internally as an 8 byte unsigned integer (same size as a current Date/Time Column) or if a user was to use an unsigned bigint column to store their timestamps, instead of MySQL's TIMESTAMP field, it would be able to store timestamps through the middle of the year 584,542,048,060. The reason why DATETIME, TIME, and DATE columns store data so inefficiently is due to the nature of dates. Times are rather hard to fit into base 1, which is how binary numbers are stored. If you want to store just the minute of a time, you need to be able to store values between 1 and 59. However, the smallest number of bits (binary digits) required to store that data is 6. However, 6 bits are able to store values between 1 and 64. Since the values between 61 and 64 are never needed or used, the ability to have those numbers amounts to wasted space. With the exception of the year information, hours, minutes, seconds, months and days, do not require a number of values exactly equal to a power of 2, and thus waste data. When the data is formed in a unix timestamp, no such loss occurs as the number of seconds since 1970 can be equal to any possible integral value.
Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.
I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.
So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.
Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.
----------------------------------------------------------------
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1,
"SMALLINT"=>2,
"MEDIUMINT"=>3,
"INT"=>4,
"BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3,
"DATETIME"=>8,
"TIMESTAMP"=>4,
"TIME"=>3,
"YEAR"=>1,
"CHAR"=>'$M',
"VARCHAR"=>'$M+1',
"TINYBLOB"=>'$M+1',
"TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2',
"TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3',
"MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4',
"LONGTEXT"=>'$M+4');
my $D;
my $M;
my $dt;
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>)
{
chomp;
if ( $_ =~ s/create table[ ]*([a-zA-Z_]*).*/$1/i )
{
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
A timestamp uses less space, but keep in mind that when you make an UPDATE to your row, the first timestamp you have, WILL be updated with the current time.
Automatic updating of the first TIMESTAMP column in a table occurs under any of the following conditions:
* You explicitly set the column to NULL.
* The column is not specified explicitly in an INSERT or LOAD DATA INFILE statement.
* The column is not specified explicitly in an UPDATE statement and some other column changes value. An UPDATE that sets a column to the value it already has does not cause the TIMESTAMP column to be updated; if you set a column to its current value, MySQL ignores the update for efficiency.
http://dev.mysql.com/doc/mysql/en/TIMESTAMP_pre-4.1.html
Previously: "I cannot see why a TIMESTAMP....." A timestamp is always "now" or with regard to the *nix epoch, is always in the future. Therefore, the db never has to store a date "before" the *nix epoch. My guess is a time stamp is stored as *nix "seconds from the epoch". Currently, *nix stores this as a 4 byte integer (to be 8 byte in the future at some point). However, DATETIME types have to hold any date, even those before the *nix epoch. Hence, I guess they can be very large for BC dates or even larger for dates close to the end of the Universe!
Oh, and don't use TIMESTAMP over DATETIME to save space if you ever plan on running an update query on the record! Why? rtfm
IRT: Rudi Ahlers on June 6 2004 8:22am
The DATE type already contains the year portion. And the supported range is '1000-01-01' to '9999-12-31'. (storage: 3 bytes)
TIME values may range from '-838:59:59' to '838:59:59'. (storage: 3 bytes).
This means that the DATETIME type should require 6 bytes of storage, instead of 8. The 2 remaining bytes "could" be used for storing fractions of a second.
Notice the BLOB's max size is only 64K. MySQL's JDBC trunicated my .wav file at 64K w/ no Exception being generated. Look at a bigger BLOB like MEDIUMBLOB.
Follow up on previous comment. Hibernate told me that the BLOB column had been trunicated but using the JDBC driver did not generate an Exception.
I suppose the 8 bytes size of DATETIME is for better performances (CPU like 2's power memory alignments). If you use DATETIME instead of two separated columns DATE and TIME, it may be because you need to process them at once for comparing, etc... So, you loose few memory space but you increase processing performances.
If you think that "a byte is a byte" and should not be lost this way, have a look at M$ implementation of SYSTEMTIME struct (somewhere in VC98/Include/WTYPES.h):
typedef struct _SYSTEMTIME
{
WORD wYear;
WORD wMonth;
WORD wDayOfWeek;
WORD wDay;
WORD wHour;
WORD wMinute;
WORD wSecond;
WORD wMilliseconds;
} SYSTEMTIME;
And yes, WORD type is a 16bits unsigned word... :o).
This thing takes twice the DATETIME memory space!
I suppose some peoples did not want to make the "640KB is enough" mistake again.
Here's a modification of Marc's script above that also handles ENUM's. Enjoy.
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3,
"INT"=>4, "BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "TIME"=>3, "YEAR"=>1,
"CHAR"=>'$M', "VARCHAR"=>'$M+1',
"ENUM"=>1,
"TINYBLOB"=>'$M+1', "TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2', "TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3', "MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4', "LONGTEXT"=>'$M+4');
my ($D, $M, $dt);
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>) {
chomp;
if ( $_ =~ s/create table[ ]`*([a-zA-Z_]*).*`/$1/i ) {
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|ENUM[ ]*\(*['A-Za-z_,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/\(.*\)//g;
$dt =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
"A timestamp uses less space, but keep in mind that when you make an UPDATE to your row, the first timestamp you have, WILL be updated with the current time."
It doesn't have to be so. You can create the table with all timestamp columns defaulted to 0000-00-00 00:00:00, not CURRENT_TIMESTAMP
Add your own comment.