rejetto,
Now let's turn to utf-16, once you mentioned, as the format of filenames on Windows.
(It's my fault to mention ansi everytime, the main problem might not there)The
Unicode standard pdf might be useful.
I reffered to its figure2.11, and made a draft to simulate the conversion from utf8 to utf16, then found somewhere suspicious.
I'll send the draft tomorrow.
My computer will keep opened tonight, for your futher test work.
Edit:I attatched that draft.
Figure 2.11:
| A | Ω | 語 |
UTF-8 | 41 | CE A9 | E8 AA 9E |
UTF-16 | 0041 | 03A9 | 8A9E |
I had a reversed conversion, from utf-8 to utf-16:
We can see the Chinese character takes place of 3 bytes in utf-8, but 2 in utf-16.
The omega always takes 2 bytes.
So, if there are only Greek symbols(2bytes) in filename, they will fine;
if there is an odd numbers of Chinese character(3bytes), even if the amount of multi-byte chars in a chunk is even, they will be bad.
I think it's the problem of byte counter or sth else, making the last odd byte orphan and connected to the following single-byte char, then both of them got corrupted.
Above is not 100% true, only for reference.
Filenames in draft:
语文.txt
语Ω.txt
语文书.txt