chardet has other blind-spots with non-*x and older charsets. (7) Note that chardet doesn't detect UTF-16圎 without a BOM. (6) Which of the four possibilities (UTF-16LE / UTF-16BE) x (BOM / no BOM) are you calling UTF-16? Note that I'm deliberately not trying to infer anything from the presence of 'utf-16' in your code. (5) What other character sets do you reasonably expect to need to handle? (3) Why not write ALL output files in the SAME handle-all-Unicode-characters encoding e.g. (2) Are the two terms in findreplace ever going to include non-ASCII characters? Note that an answer of "yes" would indicate that the goal of writing an output file in the same character set as the input may be difficult/impossible to achieve. ![]() ![]() (1) ASCII is a subset of UTF-8 in the sense that if a file can be decoded successfully using ASCII, then it can be decoded successfully using UTF-8.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |