rejetto forum
Software => HFS ~ HTTP File Server => Programmers corner => Topic started by: LeoNeeson on May 12, 2014, 06:58:48 AM
-
I think I may have a "workaround" to somehow solve this Unicode problem. I have the idea in my mind, but it's hard to explain, but I'll try my best. I will need direct interaction of Rejetto (http://www.rejetto.com/forum/profile/?u=1). First, I need to know some internal things about HFS. So, this is my first question:
- Is HFS able to internally "read" any file with unicode characters?. When I say "read", I mean read the file at low level, no matter the file name.
-
file handling is not a problem,
even if the support was not full on that, it could be easily done
-
Well, in that case, my idea may work. I've discovered that some servers send the file name, after the download has started. So, I thought that this can solve the "unicode problem" once and for good.
For example, you may add an option (to users with unicode problems), to generate and use generic URLs, instead of using URLs with the file name embedded. For example:
Instead of this:
http://myserver.com/folder1/folder2/file123.rar
http://myserver.com/folder52/folder62/file123.rar
Give something like:
http://myserver.com/download/333445289345343 (some random number)
http://myserver.com/download/845600032348881 (some random number)
...and when the download starts, send the real file name (file123.rar)
I've seen this in a lot of servers. For example, when you download a subtitle from addic7ed.com, this is exactly what happens.
Here is the addic7ed.com example.
The URL was: http://addic7ed.com/original/88150/1
and when the download has started, it automatically sends
the real file name: "Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"
Check the log of my FlashGet:
Thu May 15 03:05:15 2014 Conectando www.addic7ed.com:80
Thu May 15 03:05:15 2014 Conectando www.addic7ed.com [IP=94.23.9.195:80]
Thu May 15 03:05:16 2014 Conectado
Thu May 15 03:05:16 2014 GET /original/88150/1 HTTP/1.1
Thu May 15 03:05:16 2014 Host: www.addic7ed.com
Thu May 15 03:05:16 2014 Accept: */*
Thu May 15 03:05:16 2014 Referer: http://www.addic7ed.com/serie/Chicago_PD/1/14/The_Docks
Thu May 15 03:05:16 2014 User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17...
Thu May 15 03:05:16 2014 Pragma: no-cache
Thu May 15 03:05:16 2014 Cache-Control: no-cache
Thu May 15 03:05:16 2014 Connection: close
Thu May 15 03:05:17 2014 HTTP/1.1 200 OK
Thu May 15 03:05:17 2014 Server: nginx
Thu May 15 03:05:17 2014 Date: Thu, 15 May 2014 06:05:12 GMT
Thu May 15 03:05:17 2014 Content-Type: text/srt; charset=
Thu May 15 03:05:17 2014 Transfer-Encoding: chunked
Thu May 15 03:05:17 2014 Connection: close
Thu May 15 03:05:17 2014 X-Powered-By: PHP/5.3.3
Thu May 15 03:05:17 2014 Set-Cookie: PHPSESSID=2qg3b7dgpipjptfeknpsmq0br7; path=/
Thu May 15 03:05:17 2014 Expires: Thu, 19 Nov 1981 08:52:00 GMT
Thu May 15 03:05:17 2014 Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Thu May 15 03:05:17 2014 Pragma: no-cache
Thu May 15 03:05:17 2014 Content-Disposition: attachment; filename="Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"
Thu May 15 03:05:17 2014 ¡Empezando a recibir datos!
These are the important lines:
Thu May 15 03:05:16 2014 GET /original/88150/1 HTTP/1.1
Thu May 15 03:05:17 2014 Content-Disposition: attachment; filename="Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"
If you need another example to analyze, just ask me.
Do you think this can fix the problem and make HFS unicode compatible?... :)
(I'm just trying to help)
-
what you have descriped is url encoding which will fix unicode issues...
Possilbe ideas and way to fix (code behind it, unknown)
hfs would have some chacter chart info from a source like this:
http://www.w3schools.com/tags/ref_urlencode.asp
in which url file paths codes are encoded and decded...
I thnk they are already working on this leo...
---------
URL Encoding Reference
ASCII Character URL-encoding
space %20
! %21
" %22
# %23
$ %24
% %25
& %26
' %27
( %28
) %29
* %2A
+ %2B
, %2C
- %2D
. %2E
/ %2F
0 %30
1 %31
2 %32
3 %33
4 %34
5 %35
6 %36
7 %37
8 %38
9 %39
: %3A
; %3B
< %3C
= %3D
> %3E
? %3F
@ %40
A %41
B %42
C %43
D %44
E %45
F %46
G %47
H %48
I %49
J %4A
K %4B
L %4C
M %4D
N %4E
O %4F
P %50
Q %51
R %52
S %53
T %54
U %55
V %56
W %57
X %58
Y %59
Z %5A
[ %5B
\ %5C
] %5D
^ %5E
_ %5F
` %60
a %61
b %62
c %63
d %64
e %65
f %66
g %67
h %68
i %69
j %6A
k %6B
l %6C
m %6D
n %6E
o %6F
p %70
q %71
r %72
s %73
t %74
u %75
v %76
w %77
x %78
y %79
z %7A
{ %7B
| %7C
} %7D
~ %7E
%7F
` %80
%81
‚ %82
ƒ %83
„ %84
… %85
† %86
‡ %87
ˆ %88
‰ %89
Š %8A
‹ %8B
Œ %8C
%8D
Ž %8E
%8F
%90
‘ %91
’ %92
“ %93
” %94
• %95
– %96
— %97
˜ %98
™ %99
š %9A
› %9B
œ %9C
%9D
ž %9E
Ÿ %9F
%A0
¡ %A1
¢ %A2
£ %A3
¤ %A4
¥ %A5
¦ %A6
§ %A7
¨ %A8
© %A9
ª %AA
« %AB
¬ %AC
%AD
® %AE
¯ %AF
° %B0
± %B1
² %B2
³ %B3
´ %B4
µ %B5
¶ %B6
· %B7
¸ %B8
¹ %B9
º %BA
» %BB
¼ %BC
½ %BD
¾ %BE
¿ %BF
À %C0
Á %C1
 %C2
à %C3
Ä %C4
Å %C5
Æ %C6
Ç %C7
È %C8
É %C9
Ê %CA
Ë %CB
Ì %CC
Í %CD
Î %CE
Ï %CF
Ð %D0
Ñ %D1
Ò %D2
Ó %D3
Ô %D4
Õ %D5
Ö %D6
× %D7
Ø %D8
Ù %D9
Ú %DA
Û %DB
Ü %DC
Ý %DD
Þ %DE
ß %DF
à %E0
á %E1
â %E2
ã %E3
ä %E4
å %E5
æ %E6
ç %E7
è %E8
é %E9
ê %EA
ë %EB
ì %EC
í %ED
î %EE
ï %EF
ð %F0
ñ %F1
ò %F2
ó %F3
ô %F4
õ %F5
ö %F6
÷ %F7
ø %F8
ù %F9
ú %FA
û %FB
ü %FC
ý %FD
þ %FE
ÿ %FF
URL Encoding Reference
The ASCII device control characters %00-%1f were originally designed to control hardware devices. Control characters have nothing to do inside a URL.
ASCII Character Description URL-encoding
NUL null character %00
SOH start of header %01
STX start of text %02
ETX end of text %03
EOT end of transmission %04
ENQ enquiry %05
ACK acknowledge %06
BEL bell (ring) %07
BS backspace %08
HT horizontal tab %09
LF line feed %0A
VT vertical tab %0B
FF form feed %0C
CR carriage return %0D
SO shift out %0E
SI shift in %0F
DLE data link escape %10
DC1 device control 1 %11
DC2 device control 2 %12
DC3 device control 3 %13
DC4 device control 4 %14
NAK negative acknowledge %15
SYN synchronize %16
ETB end transmission block %17
CAN cancel %18
EM end of medium %19
SUB substitute %1A
ESC escape %1B
FS file separator %1C
GS group separator %1D
RS record separator %1E
US unit separator %1F
---------------
-
It's not only about URL encoding, it's more about how HFS talks to the download client, when the download starts.
Yes, well, personally I really don't need to have unicode support at all, but I think it will help a lot of people around the world. I'm just trying to give ideas to make HFS better, because I think it's great, and may be this can give the users another way to handle unicode files. I hope this helps...
-
sorry but that's not the problem i have with HFS 2 and unicode.
It's all about the GUI, not the networking part, not the file handling.
HFS 3 already handles unicode correctly. Sad is that i don't have enough time to make it complete.