rejetto forum

2.4 beta unicode problem

0 Members and 1 Guest are viewing this topic.

Offline NaitLee

  • Occasional poster
  • *
    • Posts: 72
  • Computer brained boy
    • View Profile
I had already tested with various browsers before, but all of them have problems...

Your uploads were fine, I see the file 哲学.ppt is already fine also in my browser now...

But the file 生活处处有哲学.ppt, which is the original filename for the problem test, goes bad. Also try that please?
Sorry for offering a filename with no problem...

A discovery:
 Multi-byte ansi characters have something interesting --
 These chars almost goes with 2 bytes in ansi,
 but in utf-8, they are expressed in 3 bytes.
 So, I found that: if the numbers of utf chars are odd, the upload fails with orphan non-print byte. If it's even, it succeeds.
Thanks for noticing me :D , I'm just someone normal like others here :D
But don't forget to check out my template ;P


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13260
    • View Profile
ok, now i can see the problem on your hfs.
Still not on mine, but at least will be able to make some more investigation.
I'll be out few hours now.


Offline NaitLee

  • Occasional poster
  • *
    • Posts: 72
  • Computer brained boy
    • View Profile
rejetto,

Now let's turn to utf-16, once you mentioned, as the format of filenames on Windows.
(It's my fault to mention ansi everytime, the main problem might not there)
The Unicode standard pdf might be useful.

I reffered to its figure2.11, and made a draft to simulate the conversion from utf8 to utf16, then found somewhere suspicious.
I'll send the draft tomorrow.
My computer will keep opened tonight, for your futher test work.

Edit:
I attatched that draft.

Figure 2.11:
AΩ
UTF-841CE A9E8 AA 9E
UTF-16004103A98A9E
I had a reversed conversion, from utf-8 to utf-16:
We can see the Chinese character takes place of 3 bytes in utf-8, but 2 in utf-16.
The omega always takes 2 bytes.
So, if there are only Greek symbols(2bytes) in filename, they will fine;
if there is an odd numbers of Chinese character(3bytes), even if the amount of multi-byte chars in a chunk is even, they will be bad.

I think it's the problem of byte counter or sth else, making the last odd byte orphan and connected to the following single-byte char, then both of them got corrupted.

Above is not 100% true, only for reference.

Filenames in draft:
语文.txt
语Ω.txt
语文书.txt
« Last Edit: June 02, 2020, 12:25:11 AM by NaitLee »
Thanks for noticing me :D , I'm just someone normal like others here :D
But don't forget to check out my template ;P


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13260
    • View Profile
Now let's turn to utf-16, once you mentioned, as the format of filenames on Windows.

Windows is using UTF16 for its API. There's nothing to "turn to".
HFS is then trasmitting over the net using UTF8.
I'm trying to understand more on problem based on the little i have.
Your intuition about the number of chars being odd is correct.
« Last Edit: June 01, 2020, 03:25:04 PM by rejetto »


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13260
    • View Profile
it took me hours but now i have a VM with XP in chinese, and the problem is reproduced there.
Of course I can't read anything of any prompt that XP does to me. Go on blindly.


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13260
    • View Profile
i had to split the topic because this bug has nothing to do with the translation of HFS.

Anyway, after hours of work i finally found the point where the bug and fixed it.
You'll see in next release.

I could do this only after many many builds tested using the chinese Windows. I don't know how long it would have took otherwise -_-
I really appreciated your help anyway, and be happy because the bug is gone :)


Offline NaitLee

  • Occasional poster
  • *
    • Posts: 72
  • Computer brained boy
    • View Profile
I really appreciated your help anyway, and be happy because the bug is gone :)

Glad to hear that! Good job :D
Thanks for noticing me :D , I'm just someone normal like others here :D
But don't forget to check out my template ;P