Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #282 (assigned defect)

Opened 2 years ago

Last modified 4 months ago

tango.text.convert.Utf accepts some invalid UTF sequences

Reported by: Deewiant Assigned to: kris (accepted)
Priority: normal Milestone: 1.0
Component: Core Functionality Version:
Keywords: triage Cc: deewiant@gmail.com

Description

Attached is a UTF conversion testing module and its output when used with Tango's tango.text.convert.Utf functions. What Tango fails at is that it's too liberal in what it accepts. To name a few: UTF-16 surrogates encoded in UTF-8 or UTF-32, overlong representations, UTF-8 continuation bytes that aren't part of a whole sequence.

The forum topic http://www.dsource.org/projects/tango/forums/topic/13 has a UTF-8 to UTF-32 converter which works for a single code point, returning a dchar. Fixing the toUtf32 functions, at least, should be trivial if it is used as a helper function. The toUtf16 ones I haven't looked at too much, but shouldn't be too difficult.

Attachments

utf_test.d (26.8 kB) - added by Deewiant on 02/17/07 09:51:42.
UTF conversion testing module
utf_test_tango.txt (8.5 kB) - added by Deewiant on 02/17/07 09:52:49.
Testing module results for Tango 0.95 beta 1

Change History

02/17/07 09:51:42 changed by Deewiant

  • attachment utf_test.d added.

UTF conversion testing module

02/17/07 09:52:49 changed by Deewiant

  • attachment utf_test_tango.txt added.

Testing module results for Tango 0.95 beta 1

02/26/07 07:59:22 changed by larsivi

  • milestone set to 0.96 Beta 2.

03/02/07 19:06:12 changed by kris

  • priority changed from major to normal.
  • status changed from new to assigned.
  • version deleted.
  • milestone changed from 0.96 Beta 2 to 1.0.

03/20/07 05:38:55 changed by Deewiant

  • cc set to deewiant@gmail.com.

05/10/07 00:42:26 changed by kris

haven't forgotten about this :)

05/24/08 15:09:39 changed by larsivi

  • keywords set to triage.