rejetto forum

Possible Unicode workaround to HFS 2.xx...

0 Members and 1 Guest are viewing this topic.

Offline LeoNeeson

  • Tireless poster
  • ****
    • Posts: 842
  • Status: On hiatus (sporadically here)
    • View Profile
    • twitter.com/LeoNeeson
I think I may have a "workaround" to somehow solve this Unicode problem. I have the idea in my mind, but it's hard to explain, but I'll try my best. I will need direct interaction of Rejetto. First, I need to know some internal things about HFS. So, this is my first question:

- Is HFS able to internally "read" any file with unicode characters?. When I say "read", I mean read the file at low level, no matter the file name.
HFS in Spanish (HFS en Español) / How to compile HFS (Tutorial)
» Currently taking a break, until HFS v2.4 get his stable version.


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13510
    • View Profile
file handling is not a problem,
even if the support was not full on that, it could be easily done


Offline LeoNeeson

  • Tireless poster
  • ****
    • Posts: 842
  • Status: On hiatus (sporadically here)
    • View Profile
    • twitter.com/LeoNeeson
Well, in that case, my idea may work. I've discovered that some servers send the file name, after the download has started. So, I thought that this can solve the "unicode problem" once and for good.

For example, you may add an option (to users with unicode problems), to generate and use generic URLs, instead of using URLs with the file name embedded. For example:

Code: [Select]
Instead of this:
http://myserver.com/folder1/folder2/file123.rar
http://myserver.com/folder52/folder62/file123.rar

Give something like:
http://myserver.com/download/333445289345343 (some random number)
http://myserver.com/download/845600032348881 (some random number)

...and when the download starts, send the real file name (file123.rar)

I've seen this in a lot of servers. For example, when you download a subtitle from addic7ed.com, this is exactly what happens.


Here is the addic7ed.com example.
The URL was:
Code: [Select]
http://addic7ed.com/original/88150/1and when the download has started, it automatically sends
the real file name: "Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"

Check the log of my FlashGet:
Code: [Select]
Thu May 15 03:05:15 2014 Conectando www.addic7ed.com:80
Thu May 15 03:05:15 2014 Conectando www.addic7ed.com [IP=94.23.9.195:80]
Thu May 15 03:05:16 2014 Conectado
Thu May 15 03:05:16 2014 GET /original/88150/1 HTTP/1.1
Thu May 15 03:05:16 2014 Host: www.addic7ed.com
Thu May 15 03:05:16 2014 Accept: */*
Thu May 15 03:05:16 2014 Referer: http://www.addic7ed.com/serie/Chicago_PD/1/14/The_Docks
Thu May 15 03:05:16 2014 User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17...
Thu May 15 03:05:16 2014 Pragma: no-cache
Thu May 15 03:05:16 2014 Cache-Control: no-cache
Thu May 15 03:05:16 2014 Connection: close
Thu May 15 03:05:17 2014 HTTP/1.1 200 OK
Thu May 15 03:05:17 2014 Server: nginx
Thu May 15 03:05:17 2014 Date: Thu, 15 May 2014 06:05:12 GMT
Thu May 15 03:05:17 2014 Content-Type: text/srt; charset=
Thu May 15 03:05:17 2014 Transfer-Encoding: chunked
Thu May 15 03:05:17 2014 Connection: close
Thu May 15 03:05:17 2014 X-Powered-By: PHP/5.3.3
Thu May 15 03:05:17 2014 Set-Cookie: PHPSESSID=2qg3b7dgpipjptfeknpsmq0br7; path=/
Thu May 15 03:05:17 2014 Expires: Thu, 19 Nov 1981 08:52:00 GMT
Thu May 15 03:05:17 2014 Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Thu May 15 03:05:17 2014 Pragma: no-cache
Thu May 15 03:05:17 2014 Content-Disposition: attachment; filename="Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"
Thu May 15 03:05:17 2014 ¡Empezando a recibir datos!

These are the important lines:
Thu May 15 03:05:16 2014 GET /original/88150/1 HTTP/1.1
Thu May 15 03:05:17 2014 Content-Disposition: attachment; filename="Chicago PD - 01x14 - The Docks.LOL.English.HI.C.orig.Addic7ed.com.srt"


If you need another example to analyze, just ask me.

Do you think this can fix the problem and make HFS unicode compatible?... :)
(I'm just trying to help)
« Last Edit: May 15, 2014, 10:45:20 AM by LeoNeeson »
HFS in Spanish (HFS en Español) / How to compile HFS (Tutorial)
» Currently taking a break, until HFS v2.4 get his stable version.


Offline bmartino1

  • Tireless poster
  • ****
    • Posts: 910
  • I'm only trying to help i mean no offense.
    • View Profile
    • My HFS Google Drive Shared Link
what you have descriped is url encoding which will fix unicode issues...
Possilbe ideas and way to fix (code behind it, unknown)

hfs would have some chacter chart info from a source like this:
http://www.w3schools.com/tags/ref_urlencode.asp

in which url file paths codes are encoded and decded...

I thnk they are already working on this leo...
---------
URL Encoding Reference
ASCII Character   URL-encoding
space   %20
!   %21
"   %22
#   %23
$   %24
%   %25
&   %26
'   %27
(   %28
)   %29
*   %2A
+   %2B
,   %2C
-   %2D
.   %2E
/   %2F
0   %30
1   %31
2   %32
3   %33
4   %34
5   %35
6   %36
7   %37
8   %38
9   %39
:   %3A
;   %3B
<   %3C
=   %3D
>   %3E
?   %3F
@   %40
A   %41
B   %42
C   %43
D   %44
E   %45
F   %46
G   %47
H   %48
I   %49
J   %4A
K   %4B
L   %4C
M   %4D
N   %4E
O   %4F
P   %50
Q   %51
R   %52
S   %53
T   %54
U   %55
V   %56
W   %57
X   %58
Y   %59
Z   %5A
[   %5B
\   %5C
]   %5D
^   %5E
_   %5F
`   %60
a   %61
b   %62
c   %63
d   %64
e   %65
f   %66
g   %67
h   %68
i   %69
j   %6A
k   %6B
l   %6C
m   %6D
n   %6E
o   %6F
p   %70
q   %71
r   %72
s   %73
t   %74
u   %75
v   %76
w   %77
x   %78
y   %79
z   %7A
{   %7B
|   %7C
}   %7D
~   %7E
    %7F
`   %80
   %81
‚   %82
ƒ   %83
„   %84
…   %85
†   %86
‡   %87
ˆ   %88
‰   %89
Š   %8A
‹   %8B
Œ   %8C
   %8D
Ž   %8E
   %8F
   %90
‘   %91
’   %92
“   %93
”   %94
•   %95
–   %96
—   %97
˜   %98
™   %99
š   %9A
›   %9B
œ   %9C
   %9D
ž   %9E
Ÿ   %9F
    %A0
¡   %A1
¢   %A2
£   %A3
¤   %A4
¥   %A5
¦   %A6
§   %A7
¨   %A8
©   %A9
ª   %AA
«   %AB
¬   %AC
%AD
®   %AE
¯   %AF
°   %B0
±   %B1
²   %B2
³   %B3
´   %B4
µ   %B5
¶   %B6
·   %B7
¸   %B8
¹   %B9
º   %BA
»   %BB
¼   %BC
½   %BD
¾   %BE
¿   %BF
À   %C0
Á   %C1
   %C2
à  %C3
Ä   %C4
Å   %C5
Æ   %C6
Ç   %C7
È   %C8
É   %C9
Ê   %CA
Ë   %CB
Ì   %CC
Í   %CD
Π  %CE
Ï   %CF
Р  %D0
Ñ   %D1
Ò   %D2
Ó   %D3
Ô   %D4
Õ   %D5
Ö   %D6
×   %D7
Ø   %D8
Ù   %D9
Ú   %DA
Û   %DB
Ü   %DC
Ý   %DD
Þ   %DE
ß   %DF
à   %E0
á   %E1
â   %E2
ã   %E3
ä   %E4
å   %E5
æ   %E6
ç   %E7
è   %E8
é   %E9
ê   %EA
ë   %EB
ì   %EC
í   %ED
î   %EE
ï   %EF
ð   %F0
ñ   %F1
ò   %F2
ó   %F3
ô   %F4
õ   %F5
ö   %F6
÷   %F7
ø   %F8
ù   %F9
ú   %FA
û   %FB
ü   %FC
ý   %FD
þ   %FE
ÿ   %FF

URL Encoding Reference
The ASCII device control characters %00-%1f were originally designed to control hardware devices. Control characters have nothing to do inside a URL.

ASCII Character   Description   URL-encoding
NUL   null character   %00
SOH   start of header   %01
STX   start of text   %02
ETX   end of text   %03
EOT   end of transmission   %04
ENQ   enquiry   %05
ACK   acknowledge   %06
BEL   bell (ring)   %07
BS   backspace   %08
HT   horizontal tab   %09
LF   line feed   %0A
VT   vertical tab   %0B
FF   form feed   %0C
CR   carriage return   %0D
SO   shift out   %0E
SI   shift in   %0F
DLE   data link escape   %10
DC1   device control 1   %11
DC2   device control 2   %12
DC3   device control 3   %13
DC4   device control 4   %14
NAK   negative acknowledge   %15
SYN   synchronize   %16
ETB   end transmission block   %17
CAN   cancel   %18
EM   end of medium   %19
SUB   substitute   %1A
ESC   escape   %1B
FS   file separator   %1C
GS   group separator   %1D
RS   record separator   %1E
US   unit separator   %1F
---------------
Files I have snagged and share can be found on my google drive:

https://drive.google.com/drive/folders/1qb4INX2pzsjmMT06YEIQk9Nv5jMu33tC?usp=sharing


Offline LeoNeeson

  • Tireless poster
  • ****
    • Posts: 842
  • Status: On hiatus (sporadically here)
    • View Profile
    • twitter.com/LeoNeeson
It's not only about URL encoding, it's more about how HFS talks to the download client, when the download starts.

Yes, well, personally I really don't need to have unicode support at all, but I think it will help a lot of people around the world. I'm just trying to give ideas to make HFS better, because I think it's great, and may be this can give the users another way to handle unicode files. I hope this helps...
HFS in Spanish (HFS en Español) / How to compile HFS (Tutorial)
» Currently taking a break, until HFS v2.4 get his stable version.


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13510
    • View Profile
sorry but that's not the problem i have with HFS 2 and unicode.
It's all about the GUI, not the networking part, not the file handling.
HFS 3 already handles unicode correctly. Sad is that i don't have enough time to make it complete.