Monday, April 02, 2012 

Unicode, things to know.

http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF

Unicode currently defines just under 100,000 characters, but has space for 1,114,112 code points. They are organized into 17 “planes” of 216 (65,536) characters, numbered 0 through 16. Plane 0 is called the “Basic Multilingual Plane” or BMP.

UTF-32: each character - 32 bits, for finding the byte order the first character is U+FEFF so if your format is  U-FFFE, then you basically order accordingly.


UTF-16: Here for characters > BMP / astral planes are defined using surrogate blocks. Basically 2 - 16 bit chars. when you look at a sixteen-bit quantity, you can tell right away whether it's an ordinary BMP character or half of an astral-plane character (surrogate block), and if so, which half. For byte ordering, UTF-16BE and UTF-16LE characters are used as the first characters.

UCS-2 - used by javascript. (It has no surrogate blocks -check utf16). You can define BMP with out any problem.

UTF8 - Characters whose value is less than 128 (i.e. ASCII) are encoded as themselves in one byte; the high-order bit will always be zero. (Which means that a pure ASCII text is actually UTF-8 as it sits.) The rest have their bits ripped apart and dealt out into several (from two to four) bytes as follows:
  • The first byte has a bunch of high-order one bits telling you how many bytes are used to encode the character, followed by a zero bit.
  • The rest of the bytes each begin with a single one bit followed by a zero bit.
  • The bits of the character are dealt out in the space left over after these signaling bits.
Suppose a character is encoded in two bytes. Then the first byte has two one bits and a zero bit, leaving five bits of payload. The second has a one, a zero, and six bits of payload. Thus there are eleven bits of payload, and the biggest character that can squeeze into two bytes in UTF-8 is U+07FF, which is 11 ones.

For UTF-8 as the unit of encoding is the byte, so there are no byte-ordering issues.




 

Differences between myisam and innodb

Here is a table with some differences between innodb and myisam. Feel free to suggest if there are any more / changes to be done.

https://docs.google.com/spreadsheet/ccc?key=0AjNzDDTodZ1zdE9uZ3RiVmRSa0hvNjkxMkN5U193Z2c