View previous topic :: View next topic |
Author |
Message |
Kostya
Joined: 12 Dec 2004 Posts: 5
|
Posted: Sun Dec 12, 2004 8:20 am Post subject: Sample prorgam dnote.d fails: "invalid UTF-8 sequence&q |
|
|
I built the examlpe notepad from source in dnote.d. When I tried to open a file with cyrillic symbols in the filename it asserted:
"invalid UTF-8 sequence"
I looked up the code and found that the DFL uses the
API in the OpenFileDialog.runDialog() which gets the ASCII filename
then tries to cast it to char[] which is an UTF-8 string and then the program crashes.
Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all...
P.S: Or did I miss something, being a total newbie to D language? Please correct me if I'm wrong! |
|
Back to top |
|
|
Chris Miller
Joined: 27 Mar 2004 Posts: 514 Location: The Internet
|
Posted: Sun Dec 12, 2004 8:42 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen |
|
|
Kostya wrote: | Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all... |
In the bug list in readme.txt it does say
readme.txt wrote: | Doesn't call any unicode Windows API functions. |
Windows 9x doesn't support many of the "W" functions, so for now I've been just using the "A" ones.
Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris |
|
Back to top |
|
|
qbert
Joined: 30 Mar 2004 Posts: 209 Location: Dallas, Texas
|
Posted: Tue Dec 14, 2004 11:07 am Post subject: |
|
|
Eh no one uses win98 .
Id just go straight with *W !
Q |
|
Back to top |
|
|
Kostya
Joined: 12 Dec 2004 Posts: 5
|
Posted: Tue Dec 14, 2004 2:53 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen |
|
|
Quote: | Windows 9x doesn't support many of the "W" functions |
But most of them - the most common in fact like GetOpenFileNameW it does support via the Microsoft Layer for Unicode . I will now check if it supports all the A functions that DFL uses.
Quote: | , so for now I've been just using the "A" ones. |
That means it only works with US and some european versions of Windows - russian not included...
Quote: | Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris |
If I could be of help switching the library to W functions just ask me. Maybe you have more important "ToDos" in the DFL but for me its most important that it works at all! I could make a version that uses Unicode. I think that its unnecessary to make different calls on different OSes since most of the common W functions are supported by Win9x.
What do you think? |
|
Back to top |
|
|
Kostya
Joined: 12 Dec 2004 Posts: 5
|
Posted: Tue Dec 14, 2004 3:15 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen |
|
|
Quote: | But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris |
Ive just looked how it works in std.file in phobos... Well it fails on cyrillic characters just the same. (I use WinXP). To be more precize I copied the source of the std.file function to my test program, then debugged. It fails whether I set variable to 1 or 0 i.e. on both WinXp and Win98 on the string converting functions:
Code: | wchar* namez = std.utf.toUTF16z(name); |
Code: | char* namez = toMBSz(name); |
Should I address this to D language creators perhaps? Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work... |
|
Back to top |
|
|
jcc7
Joined: 22 Feb 2004 Posts: 657 Location: Muskogee, OK, USA
|
Posted: Tue Dec 14, 2004 5:34 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen |
|
|
Kostya wrote: | Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work... | Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues |
|
Back to top |
|
|
Kostya
Joined: 12 Dec 2004 Posts: 5
|
Posted: Wed Dec 15, 2004 2:50 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen |
|
|
Quote: | Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues |
Oh thank you for the links. I got it now. I used 8-bit encoding instead of UTF-8.
The problem was:
When I selected a file with cyrillic name using "GetOpenFileNameA" function it returned a string in non UTF-8 but locale-specific 8-bit string.
Some characters in it were above 0x80 and this caused error.
But this is all very confusing still because the Windows does not use UTF-8 - only ANSI and UTF-16. And the DFL has the same problem:
Code: |
//this is the error:
//got russian filename "d:\ЫЫЫ.txt" stored in a file (practically it is what GetOpenFileNameA returns)
char[] c = cast(char[]) read("d:\\111.txt");
//used it to try and open that file (it exists)
char[] result = cast(char[]) read(c);//ERROR: Invalid UTF-8 Sequence
//thats how I managed it to work:
//got russian filename "d:\ЫЫЫ.txt" stored in a file to array of ANSI 1-byte characters
byte[] b = cast(byte[]) read("d:\\111.txt");
//allocate UTF-16 string
wchar[] w = new wchar[b.length];
//convert ANSI to UTF-16. I didn't find the any phobos functions that do it
MultiByteToWideChar(0,0,cast(char*)b, b.length, w, w.length);
char[] result2nd = cast(char[]) read(toUTF8(w));//ok
printf("?.*s\n",result2nd);//finally it read the file with cyrillic filename.
|
|
|
Back to top |
|
|
Chris Miller
Joined: 27 Mar 2004 Posts: 514 Location: The Internet
|
|
Back to top |
|
|
Kostya
Joined: 12 Dec 2004 Posts: 5
|
Posted: Sat Dec 18, 2004 5:28 am Post subject: |
|
|
Cool! Everything works fine now. Thanks a lot. |
|
Back to top |
|
|
|