Tuesday, April 21, 2009

google-base character encoding

Character encoding on google-base is critical to sucessful data-feed-files. Character encoding must be the same in all the following places:

  1. all characters on your website;
  2. all tools' settings used to create or edit your data-feed-file;
  3. declared within the data-feed-file;
  4. all characters within the data-feed-file itself;
  5. detected by google-base during processing.
Any character encoding information that doesn't match may cause google-base to fail.

The best thing to do is always use the U.S. ASCII printable characters which should work across all character encodings.

The worst thing to do is type or use a character that looks ok on your machine or website but is outside the bounds of the declared or detected encodings.

If you must use a character outside the U.S. ASCII printable characters please use the following chart to find a unicode numeric-entity you can use; if you cannot use a numeric-entity or US-ASCII printable character, simply don't use that character at all within your data-feed-file.

use ascii or numeric-entities e.g. never raw characters
gbp £££
(c) ©©©
(r) ®®
^2 ²²
1/4 ¼¼
1/2 ½½
3/4 ¾¾
i íí
- –
- —
' ’
" “
" ”
* •
... …
(sm) ℠
(tm) ™