Plain text document encoding

5/4/2023

chardet has other blind-spots with non-*x and older charsets. (7) Note that chardet doesn't detect UTF-16圎 without a BOM. (6) Which of the four possibilities (UTF-16LE / UTF-16BE) x (BOM / no BOM) are you calling UTF-16? Note that I'm deliberately not trying to infer anything from the presence of 'utf-16' in your code. (5) What other character sets do you reasonably expect to need to handle? (3) Why not write ALL output files in the SAME handle-all-Unicode-characters encoding e.g. (2) Are the two terms in findreplace ever going to include non-ASCII characters? Note that an answer of "yes" would indicate that the goal of writing an output file in the same character set as the input may be difficult/impossible to achieve.

(1) ASCII is a subset of UTF-8 in the sense that if a file can be decoded successfully using ASCII, then it can be decoded successfully using UTF-8.

0 Comments

Plain text document encoding

Leave a Reply.

Author

Archives

Categories