rejetto forum

Url Encoding

parade · 6 · 6072

0 Members and 1 Guest are viewing this topic.

Offline parade

  • Tireless poster
  • ****
    • Posts: 138
    • View Profile
Hello,

please have a look at this thread, which I startet in the ToG-Subforum.
http://www.rejetto.com/forum/index.php?topic=5200.0

First I have to say, that this topic is very new to me and I am just looking for a reason for this problem. So don't blame me if I am looking in the wrong direction. I had a look in WWW at what Url-Encoding is and then I had a look at the url-ecoding in HFS.

If I have a MP3 with this filename:
ÄÖÜß.mp3
I would expect to habe to use this encoding: %c4%d6%dc%df.mp3
As I can see HFS uses this encoding: %C3%84%C3%96%C3%9C%C3%9F.mp3

Why?

Greetings
parade


Offline Foggy

  • Tireless poster
  • ****
    • Posts: 806
    • View Profile
That is a very good question, and Im going to take a stab in the dark and say it has to do with the character encoding used in hfs.


Offline TSG

  • Operator
  • Tireless poster
  • *****
    • Posts: 1935
    • View Profile
    • RAWR-Designs
That is a very good question, and Im going to take a stab in the dark and say it has to do with the character encoding used in hfs.

That's the conclusion I came to also. It was brought up somewhere else recently.


Offline MarkV

  • Tireless poster
  • ****
    • Posts: 764
    • View Profile
It may because of Unicode, Unicode is 2 bytes per character...
http://worldipv6launch.org - The world is different now.


Offline parade

  • Tireless poster
  • ****
    • Posts: 138
    • View Profile
It may because of Unicode, Unicode is 2 bytes per character...

At the same time you wrote this I found this  ;)

http://www.w3.org/International/O-URL-code.html


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13523
    • View Profile
exactly, that's the standard.

It is not correct to say unicode is 2 bytes. (maybe once it was)
Unicode gives a number to every symbol. Then there are several ways to pass the numbers as bytes. The one MarkV says it is called UTF-16.
In URLs instead UTF-8 is used.