FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Sample prorgam dnote.d fails: "invalid UTF-8 sequence&q

 
Post new topic   Reply to topic     Forum Index -> DFL
View previous topic :: View next topic  
Author Message
Kostya



Joined: 12 Dec 2004
Posts: 5

PostPosted: Sun Dec 12, 2004 8:20 am    Post subject: Sample prorgam dnote.d fails: "invalid UTF-8 sequence&q Reply with quote

I built the examlpe notepad from source in dnote.d. When I tried to open a file with cyrillic symbols in the filename it asserted:
"invalid UTF-8 sequence"

I looked up the code and found that the DFL uses the
Code:
GetOpenFileNameA

API in the OpenFileDialog.runDialog() which gets the ASCII filename
then tries to cast it to char[] which is an UTF-8 string and then the program crashes.

Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all...

P.S: Or did I miss something, being a total newbie to D language? Please correct me if I'm wrong!
Back to top
View user's profile Send private message
Chris Miller



Joined: 27 Mar 2004
Posts: 514
Location: The Internet

PostPosted: Sun Dec 12, 2004 8:42 pm    Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen Reply with quote

Kostya wrote:
Why don't you use windows APIs that end in *W instead of their *A counterparts? Since D does not support ASCII encoding at all...


In the bug list in readme.txt it does say
readme.txt wrote:
Doesn't call any unicode Windows API functions.

Windows 9x doesn't support many of the "W" functions, so for now I've been just using the "A" ones.

Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris
Back to top
View user's profile Send private message
qbert



Joined: 30 Mar 2004
Posts: 209
Location: Dallas, Texas

PostPosted: Tue Dec 14, 2004 11:07 am    Post subject: Reply with quote

Eh no one uses win98 Confused .

Id just go straight with *W !

Q
Back to top
View user's profile Send private message MSN Messenger
Kostya



Joined: 12 Dec 2004
Posts: 5

PostPosted: Tue Dec 14, 2004 2:53 pm    Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen Reply with quote

Quote:
Windows 9x doesn't support many of the "W" functions


But most of them - the most common in fact like GetOpenFileNameW it does support via the Microsoft Layer for Unicode . I will now check if it supports all the A functions that DFL uses.

Quote:
, so for now I've been just using the "A" ones.


That means it only works with US and some european versions of Windows - russian not included... Crying or Very sad

Quote:
Since ASCII is a subset of UTF-8, DFL will work only with ASCII. But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris


If I could be of help switching the library to W functions just ask me. Maybe you have more important "ToDos" in the DFL but for me its most important that it works at all! I could make a version that uses Unicode. I think that its unnecessary to make different calls on different OSes since most of the common W functions are supported by Win9x.

What do you think?
Back to top
View user's profile Send private message
Kostya



Joined: 12 Dec 2004
Posts: 5

PostPosted: Tue Dec 14, 2004 3:15 pm    Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen Reply with quote

Quote:
But I have been contemplating doing what std.file does. It does string conversions based on the OS and calls the "W" functions on NT.
- Chris


Ive just looked how it works in std.file in phobos... Well it fails on cyrillic characters just the same. (I use WinXP). To be more precize I copied the source of the std.file
Code:
read()
function to my test program, then debugged. It fails whether I set
Code:
useWfuncs
variable to 1 or 0 i.e. on both WinXp and Win98 on the string converting functions:
Code:
wchar* namez = std.utf.toUTF16z(name);

Code:
char* namez = toMBSz(name);


Should I address this to D language creators perhaps? Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work...
Back to top
View user's profile Send private message
jcc7



Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

PostPosted: Tue Dec 14, 2004 5:34 pm    Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen Reply with quote

Kostya wrote:
Although this cannot be that D does not support cyrillic and no one noticed... I'll try to make it work...
Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues
Back to top
View user's profile Send private message AIM Address
Kostya



Joined: 12 Dec 2004
Posts: 5

PostPosted: Wed Dec 15, 2004 2:50 pm    Post subject: Re: Sample prorgam dnote.d fails: "invalid UTF-8 sequen Reply with quote

Quote:
Are you using a UTF-encoding for characters? I'm sure that D wouldn't discriminate against Cyrillic characters. You might be able to find a solution by reading this page: Unicode Issues


Oh thank you for the links. I got it now. I used 8-bit encoding instead of UTF-8.

The problem was:

When I selected a file with cyrillic name using "GetOpenFileNameA" function it returned a string in non UTF-8 but locale-specific 8-bit string.
Some characters in it were above 0x80 and this caused error.
But this is all very confusing still because the Windows does not use UTF-8 - only ANSI and UTF-16. And the DFL has the same problem:

Code:

//this is the error:
   //got russian filename "d:\ЫЫЫ.txt" stored in a file (practically it is what GetOpenFileNameA returns)

   char[] c = cast(char[]) read("d:\\111.txt");
   //used it to try and open that file (it exists)
   char[] result = cast(char[]) read(c);//ERROR: Invalid UTF-8 Sequence

//thats how I managed it to work:
   //got russian filename "d:\ЫЫЫ.txt" stored in a file to array of ANSI 1-byte characters
   byte[] b = cast(byte[]) read("d:\\111.txt");
   //allocate UTF-16 string
   wchar[] w = new wchar[b.length];
   //convert ANSI to UTF-16. I didn't find the any phobos functions that do it
   MultiByteToWideChar(0,0,cast(char*)b, b.length, w, w.length);

   char[] result2nd = cast(char[]) read(toUTF8(w));//ok
   printf("?.*s\n",result2nd);//finally it read the file with cyrillic filename.

Back to top
View user's profile Send private message
Chris Miller



Joined: 27 Mar 2004
Posts: 514
Location: The Internet

PostPosted: Fri Dec 17, 2004 6:16 am    Post subject: Reply with quote

I've started supporting UTF-8, check it out in the snapshot http://www.dprogramming.com/dfl/snapshots/ Shocked
Back to top
View user's profile Send private message
Kostya



Joined: 12 Dec 2004
Posts: 5

PostPosted: Sat Dec 18, 2004 5:28 am    Post subject: Reply with quote

Cool! Everything works fine now. Thanks a lot. Very Happy
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DFL All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group