Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #723 (new enhancement)

Opened 16 years ago

Last modified 13 years ago

Add single-character toUpper/toLower/toFold to tango.text.Unicode

Reported by: JarrettBillingsley Assigned to: JarrettBillingsley
Priority: normal Milestone: 1.0
Component: Tango Version: 0.99.2 Don
Keywords: triage Cc: JarrettBillingsley

Description

Would be nice to have them. Currently I have to do something like

dchar[1] buf;
dchar lower = toLower([origChar], buf)[0];

Attachments

Unicode.d.patch (2.7 kB) - added by doob on 11/09/10 12:24:50.

Change History

11/02/07 03:12:27 changed by JarrettBillingsley

  • cc set to JarrettBillingsley.

11/02/07 05:54:59 changed by kris

there's this problem where the upper- or lower-case of a character can actually be an array of them, instead of just one. So there's still that to deal with ...

11/02/07 19:27:12 changed by JarrettBillingsley

Ooh, scary. Course, dchar > dchar isn't an issue. But maybe for the others, you could pass in a wchar[2] or char[4] and it'd return a slice.

11/02/07 20:15:43 changed by kris

for uppercase of a single dchar, the result can be multiple dchars. Weird, I know :)

(follow-up: ↓ 7 ) 11/03/07 03:33:17 changed by kris

  • owner changed from kris to ptriller.
  • milestone set to 0.99.4.

12/16/07 19:51:40 changed by kris

ping Peter ... hope the ticket notification is working again

(in reply to: ↑ 5 ) 12/16/07 21:13:35 changed by ptriller

  • status changed from new to assigned.

Replying to kris: Unicode has actually two kinds of mapping the "complete" and the "simple" I could add methods for simple mapping, too.

Then a

   dchar toUpper(dchar ch);

would be possible.

I'll look into it.

(in reply to: ↑ description ) 12/16/07 22:54:38 changed by ptriller

I will add simpleToUpper simpleToLower and simpleToFold functions that use the simple Unicode Mapping that doesn't change the size.

I'll try to fix it in time for the 99.4 release. Although I cant promise

12/21/07 14:12:10 changed by larsivi

  • milestone changed from 0.99.4 to 0.99.5.

03/04/08 09:22:17 changed by Jim Panic

  • milestone changed from 0.99.5 to 0.99.6.

04/27/08 09:18:06 changed by larsivi

  • milestone changed from 0.99.6 to 0.99.7.

05/24/08 18:51:59 changed by larsivi

  • keywords set to triage.

07/10/08 10:56:41 changed by larsivi

  • milestone changed from 0.99.7 to 0.99.8.

11/08/08 11:13:46 changed by larsivi

  • owner changed from ptriller to JarrettBillingsley.
  • status changed from assigned to new.

03/29/09 14:59:17 changed by larsivi

  • milestone changed from 0.99.8 to 0.99.9.

06/27/09 16:40:34 changed by JarrettBillingsley

I've implemented this, but I'm wondering what your thoughts are for what the simpleTo* functions should do in the case that the "full" mapping is more or less than 1 character. In these cases I have the functions simply return the original character. If the full mapping is exactly 1 character, or if there is only a simple mapping, it returns that character. I suppose this is fine, since Unicode does allow "simple" mappings as a non-error case, or it seems to anyway. I feel like having these functions possibly throw an exception would be overkill, since they'd probably be used for cases where correctness is not important anyway (i.e. trying not to allocate anything on the heap).

02/01/10 19:48:42 changed by larsivi

Hai Jarrett - you have these functions for perusal?

02/01/10 19:53:43 changed by JarrettBillingsley

I believe I doooo somewhere. I think they're on my desktop. I may have nuked them though because it didn't look like this was going anywhere, but if I did, they're trivial to reimplement.

02/01/10 20:27:57 changed by larsivi

It appears that your comment slipped between a few set of chairs, sorry - we really should complete this set of functionality.

02/07/10 21:40:13 changed by larsivi

  • milestone changed from 0.99.9 to 1.0.

11/06/10 15:40:07 changed by mwarning

Here is a very simple wrapper for toLower, maybe it's an option?

dchar toSimpleLower(dchar c)
{
	dchar[1] ca = c;
	dchar[8] buf = void; //should be enough, no?
	return toLower(ca, buf)[0];
}

11/09/10 12:24:50 changed by doob

  • attachment Unicode.d.patch added.

11/09/10 12:25:56 changed by doob

I've added a patch as suggestion for an implementation. Not completely sure if it's correct.

11/19/10 04:54:08 changed by kris

thanks doob and mwarning. The tricky thing about these conversions is that sometimes the result will be two (perhaps more?) characters instead. That's why these functions didn't exist to begin with, though they're certainly handy for those cases where the output is a single T rather than a T[]

Any ideas how to address this dilemma? Should we have some "simple" char converters such as those in the patch, and maybe add the complex cases later which return a T[] instead?

(I'm not very keen on special-cases, fwiw)

11/19/10 09:03:13 changed by doob

I think we can have simple char converters. There aren't that many special cases, this link lists all special cases (if I've understood everything correctly) http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt. I counted to 119 special cases.

11/19/10 19:40:16 changed by kris

perhaps the api should be something like:

bool toLower (T src, ref dchar dst)

where the return value indicates whether 'src' is a special-case and populates 'dst' when it's not special? This would at least indicate to the caller that a T[] toLower() is required at that point?

11/19/10 19:48:08 changed by kris

or a set of methods?

dchar toLower(T src);                      // returns dchar.init for special-char
dchar toLower(T src, ref bool isSpecial);  // sets a flag for special-char
dchar[] toLowerEx(T src);                  // returns special-char expansion

(follow-up: ↓ 28 ) 11/20/10 18:51:11 changed by doob

A set of methods sounds like a good idea. I like to have a simple function that doesn't require one to add an extra parameter. Is it better to return dchar.init than the original char (src)?

(in reply to: ↑ 27 ) 11/20/10 19:12:21 changed by kris

Replying to doob:

Is it better to return dchar.init than the original char (src)?

At least returning dchar.init would indicate something is awry and, in this case, likely means toLowerEx() is required?

11/21/10 10:45:38 changed by doob

Actually I don't know that would be the best, I'm leaving that decision for you.