dsource.org - forums

dsource.org

Open Source Development for
the D Programming Language

FAQ

Memberlist

Usergroups

Profile

Sample prorgam dnote.d fails: "invalid UTF-8 sequence&q

Forum Index -> DFL

View previous topic :: View next topic

Author

Message

Kostya

Joined: 12 Dec 2004
Posts: 5

Posted: Sun Dec 12, 2004 8:20 am Post subject: Sample prorgam dnote.d fails: "invalid UTF-8 sequence&q

I built the examlpe notepad from source in dnote.d. When I tried to open a file with cyrillic symbols in the filename it asserted:
"invalid UTF-8 sequence"

I looked up the code and found that the DFL uses the

Code:

GetOpenFileNameA

API in the OpenFileDialog.runDialog() which gets the ASCII filename
then tries to cast it to char[] which is an UTF-8 string and then the program crashes.

Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all...

P.S: Or did I miss something, being a total newbie to D language? Please correct me if I'm wrong!

Chris Miller

Joined: 27 Mar 2004
Posts: 514
Location: The Internet

Posted: Sun Dec 12, 2004 8:42 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen

Kostya wrote:

Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all...

In the bug list in readme.txt it does say

readme.txt wrote:

Doesn't call any unicode Windows API functions.

Windows 9x doesn't support many of the "W" functions, so for now I've been just using the "A" ones.

Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris

qbert

Joined: 30 Mar 2004
Posts: 209
Location: Dallas, Texas

Posted: Tue Dec 14, 2004 11:07 am Post subject:

Eh no one uses win98 . Id just go straight with *W ! Q

Kostya

Joined: 12 Dec 2004
Posts: 5

Posted: Tue Dec 14, 2004 2:53 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen

Quote:

Windows 9x doesn't support many of the "W" functions

But most of them - the most common in fact like GetOpenFileNameW it does support via the Microsoft Layer for Unicode . I will now check if it supports all the A functions that DFL uses.

Quote:

, so for now I've been just using the "A" ones.

That means it only works with US and some european versions of Windows - russian not included... Crying or Very sad

Quote:

Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris

If I could be of help switching the library to W functions just ask me. Maybe you have more important "ToDos" in the DFL but for me its most important that it works at all! I could make a version that uses Unicode. I think that its unnecessary to make different calls on different OSes since most of the common W functions are supported by Win9x.

What do you think?

Kostya

Joined: 12 Dec 2004
Posts: 5

Posted: Tue Dec 14, 2004 3:15 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen

Quote:

But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris

Ive just looked how it works in std.file in phobos... Well it fails on cyrillic characters just the same. (I use WinXP). To be more precize I copied the source of the std.file

Code:

read()

function to my test program, then debugged. It fails whether I set

Code:

useWfuncs

variable to 1 or 0 i.e. on both WinXp and Win98 on the string converting functions:

Code:

wchar* namez = std.utf.toUTF16z(name);

Code:

char* namez = toMBSz(name);

Should I address this to D language creators perhaps? Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work...

jcc7

Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

Posted: Tue Dec 14, 2004 5:34 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen

Kostya wrote:

Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work...

Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues

Kostya

Joined: 12 Dec 2004
Posts: 5

Posted: Wed Dec 15, 2004 2:50 pm Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen

Quote:

Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues

Oh thank you for the links. I got it now. I used 8-bit encoding instead of UTF-8.

The problem was:

When I selected a file with cyrillic name using "GetOpenFileNameA" function it returned a string in non UTF-8 but locale-specific 8-bit string.
Some characters in it were above 0x80 and this caused error.
But this is all very confusing still because the Windows does not use UTF-8 - only ANSI and UTF-16. And the DFL has the same problem:

Code:

//this is the error:
//got russian filename "d:\Ð«Ð«Ð«.txt" stored in a file (practically it is what GetOpenFileNameA returns)

char[] c = cast(char[]) read("d:\\111.txt");
//used it to try and open that file (it exists)
char[] result = cast(char[]) read(c);//ERROR: Invalid UTF-8 Sequence

//thats how I managed it to work:
//got russian filename "d:\Ð«Ð«Ð«.txt" stored in a file to array of ANSI 1-byte characters
byte[] b = cast(byte[]) read("d:\\111.txt");
//allocate UTF-16 string
wchar[] w = new wchar[b.length];
//convert ANSI to UTF-16. I didn't find the any phobos functions that do it
MultiByteToWideChar(0,0,cast(char*)b, b.length, w, w.length);

char[] result2nd = cast(char[]) read(toUTF8(w));//ok
printf("?.*s\n",result2nd);//finally it read the file with cyrillic filename.

Chris Miller

Joined: 27 Mar 2004
Posts: 514
Location: The Internet

Posted: Fri Dec 17, 2004 6:16 am Post subject:

I've started supporting UTF-8, check it out in the snapshot http://www.dprogramming.com/dfl/snapshots/

Kostya

Joined: 12 Dec 2004
Posts: 5

Posted: Sat Dec 18, 2004 5:28 am Post subject:

Cool! Everything works fine now. Thanks a lot.

Display posts from previous:

	Forum Index -> DFL	All times are GMT - 6 Hours
Page 1 of 1

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum