Software > Beta
2.4 beta unicode problem
NaitLee:
I had already tested with various browsers before, but all of them have problems...
Your uploads were fine, I see the file 哲学.ppt is already fine also in my browser now...
But the file 生活处处有哲学.ppt, which is the original filename for the problem test, goes bad. Also try that please?
Sorry for offering a filename with no problem...
A discovery:
Multi-byte ansi characters have something interesting --
These chars almost goes with 2 bytes in ansi,
but in utf-8, they are expressed in 3 bytes.
So, I found that: if the numbers of utf chars are odd, the upload fails with orphan non-print byte. If it's even, it succeeds.
rejetto:
ok, now i can see the problem on your hfs.
Still not on mine, but at least will be able to make some more investigation.
I'll be out few hours now.
NaitLee:
rejetto,
Now let's turn to utf-16, once you mentioned, as the format of filenames on Windows.
(It's my fault to mention ansi everytime, the main problem might not there)
The Unicode standard pdf might be useful.
I reffered to its figure2.11, and made a draft to simulate the conversion from utf8 to utf16, then found somewhere suspicious.
I'll send the draft tomorrow.
My computer will keep opened tonight, for your futher test work.
Edit:
I attatched that draft.
Figure 2.11:
AΩ語UTF-841CE A9E8 AA 9EUTF-16004103A98A9EI had a reversed conversion, from utf-8 to utf-16:
We can see the Chinese character takes place of 3 bytes in utf-8, but 2 in utf-16.
The omega always takes 2 bytes.
So, if there are only Greek symbols(2bytes) in filename, they will fine;
if there is an odd numbers of Chinese character(3bytes), even if the amount of multi-byte chars in a chunk is even, they will be bad.
I think it's the problem of byte counter or sth else, making the last odd byte orphan and connected to the following single-byte char, then both of them got corrupted.
Above is not 100% true, only for reference.
Filenames in draft:
语文.txt
语Ω.txt
语文书.txt
rejetto:
--- Quote from: NaitLee on June 01, 2020, 02:35:02 PM ---Now let's turn to utf-16, once you mentioned, as the format of filenames on Windows.
--- End quote ---
Windows is using UTF16 for its API. There's nothing to "turn to".
HFS is then trasmitting over the net using UTF8.
I'm trying to understand more on problem based on the little i have.
Your intuition about the number of chars being odd is correct.
rejetto:
it took me hours but now i have a VM with XP in chinese, and the problem is reproduced there.
Of course I can't read anything of any prompt that XP does to me. Go on blindly.
Navigation
[0] Message Index
[#] Next page
Go to full version