Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #877 (new defect)

Opened 7 months ago

Last modified 4 days ago

toFold("I") != toFold("i")

Reported by: JarrettBillingsley Assigned to: ptriller
Priority: normal Milestone: 0.99.8
Component: Tango Version: 0.99.4 Frank
Keywords: triage Cc: JarrettBillingsley, kris

Description

This means case-insensitive comparisons of strings that include 'i' can fail. I notice in UnicodeData? in the internalFoldingCaseData that the following exists (0x0049 is 'i'):

 {
   code:0x0049
  ,mapping: [ 0x0069 ]
 },
 {
   code:0x0049
  ,mapping: [ 0x0131 ]
 },

Note, two mappings for the same code. I think it's getting overwritten when the lookup table is generated. I don't know if this should just have a two-element array for the mapping element.

I also haven't checked to see if there are any other duplicate entries in this table.

Change History

02/06/08 11:27:19 changed by kris

  • owner changed from kris to ptriller.

Thanks Jarrett

03/04/08 04:24:06 changed by Jim Panic

  • milestone changed from 0.99.5 to 0.99.6.

04/27/08 05:18:13 changed by larsivi

  • milestone changed from 0.99.6 to 0.99.7.

05/03/08 18:47:00 changed by JarrettBillingsley

  • cc changed from JarrettBillingsley to JarrettBillingsley, kris.

Well since Mr. Triller seems to have disappeared, I did a little more research into the problem. I found two inconsistencies in the data: one for U+0049 and one for U+0130. Here is the patch I came up with:

Index: UnicodeData.d
===================================================================
--- UnicodeData.d	(revision 3461)
+++ UnicodeData.d	(working copy)
@@ -107,7 +107,7 @@
     GeneralCategory generalCategory;
     
 //    short canonicalCombiningClass;
-    
+
     //TODO the defaults are not yet set correctly
     
 //    BidiClass bidiClass;
@@ -136,7 +136,7 @@
 //    char [] isoComment;
     
     dchar simpleUpperCaseMapping;
-    
+
     dchar simpleLowerCaseMapping;
     
     dchar simpleTitleCaseMapping;
@@ -107149,13 +107149,9 @@
  },
  {
    code:0x0049
-  ,mapping: [ 0x0069 ]
+  ,mapping: [ 0x0069, 0x0131 ]
  },
  {
-   code:0x0049
-  ,mapping: [ 0x0131 ]
- },
- {
    code:0x004A
   ,mapping: [ 0x006A ]
  },
@@ -107452,10 +107448,6 @@
   ,mapping: [ 0x0069, 0x0307 ]
  },
  {
-   code:0x0130
-  ,mapping: [ 0x0069 ]
- },
- {
    code:0x0132
   ,mapping: [ 0x0133 ]
  },

Hopefully this will be integrated.

05/03/08 18:55:58 changed by JarrettBillingsley

Orf, actually I think the 0x0049 one is wrong. Just to be safe here's a better patch:

Index: UnicodeData.d
===================================================================
--- UnicodeData.d	(revision 3461)
+++ UnicodeData.d	(working copy)
@@ -107,7 +107,7 @@
     GeneralCategory generalCategory;
     
 //    short canonicalCombiningClass;
-    
+
     //TODO the defaults are not yet set correctly
     
 //    BidiClass bidiClass;
@@ -136,7 +136,7 @@
 //    char [] isoComment;
     
     dchar simpleUpperCaseMapping;
-    
+
     dchar simpleLowerCaseMapping;
     
     dchar simpleTitleCaseMapping;
@@ -107151,11 +107151,12 @@
    code:0x0049
   ,mapping: [ 0x0069 ]
  },
+// Duplicate entry?
+//  {
+//    code:0x0049
+//   ,mapping: [ 0x0131 ]
+//  },
  {
-   code:0x0049
-  ,mapping: [ 0x0131 ]
- },
- {
    code:0x004A
   ,mapping: [ 0x006A ]
  },
@@ -107451,11 +107452,12 @@
    code:0x0130
   ,mapping: [ 0x0069, 0x0307 ]
  },
+// Duplicate entry?
+//  {
+//    code:0x0130
+//   ,mapping: [ 0x0069 ]
+//  },
  {
-   code:0x0130
-  ,mapping: [ 0x0069 ]
- },
- {
    code:0x0132
   ,mapping: [ 0x0133 ]
  },

05/24/08 14:43:46 changed by larsivi

  • keywords set to triage.

How can we verify this?

07/04/08 17:09:28 changed by larsivi

I fear UnicodeData was generated by a script we don't have ... this fix probably should be committed in any case though.

07/08/08 18:25:29 changed by larsivi

Another probe mail sent to Triller, we do not have the perl script used to generate UnicodeData.

07/10/08 06:56:26 changed by larsivi

  • milestone changed from 0.99.7 to 0.99.8.

09/01/08 21:49:11 changed by JarrettBillingsley

Can we just fold in the change? It seems to fix the problem and we can always undo it later.