Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #980 (closed defect: fixed)

Opened 7 months ago

Last modified 7 months ago

Inconsistent split behavior (both self-consistent and between text.Util and text.Regex)

Reported by: jhouse Assigned to: jascha
Priority: major Milestone: 0.99.6
Component: IO Version: 0.99.5 Jascha
Keywords: Cc:

Description

After pulling my hair out trying to port code to 0.99.5, I wrote a simple test:

unittest{

char[][] x; Cerr("===").newline; x = split("", ";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = split(";", ";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = split(";;", ";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = split("a", ";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = split("a;", ";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = split("a;;",";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; Cerr("===").newline; x = Regex(";").split(""); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = Regex(";").split(";"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = Regex(";").split(";;"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = Regex(";").split("a"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = Regex(";").split("a;"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; x = Regex(";").split("a;;"); foreach(s; x) Cerr("'")(s)("'\t"); Cerr.newline; Cerr("===").newline;

}

Output: ===

'a' 'a' 'a' === 'a' 'a' ===

In the first set, items 4 and 5 are indistinguishable. Using Regex.split is even tougher.

Change History

03/13/08 22:09:24 changed by kris

Hiya;

1) porting from where/what? If phobos, I understand it has special-case behavior depending on whether you're splitting lines versus other delimiters?

2) if I understand the issue correctly, it's about what happens where there's no terminating delimiter? If so, there's a conflict with how lines of text are generally treated. That is, does a trailing \n signify another (empty) line, or the end of the prior line? It's not yet clear what the behavior should really be, but we at least want to be consistent.

03/13/08 22:14:56 changed by kris

Hrm ... just ran your test and see what you mean

03/13/08 22:36:14 changed by jhouse

Porting from Tango 0.99.4 to 0.99.5 broke my code. I was using a free floating split function (from Regex?), but the changes forced me to update the code. tango.text.Util:split from 0.99.5 behaves differently. At first, I thought it was stripping the trailing empty string from the output, but that turns out to not always happen. You could say that I'm writing a basic parser or deserializer... I test my code be deserializing and then reserializing. I hope this explains my problem. Copying/pasting the sample code and adding includes should give good-looking output to help explain the issue. Try mapping the split output with the original data and you'll see it's not a one to one mapping.

03/13/08 22:44:19 changed by kris

SVN has an (experimental) update in tango.text.Util ... Regex not modified though

use -version=995 for backward compatiblity

03/13/08 22:45:11 changed by kris

  • owner changed from kris to jascha.

Reassigning to Jascha to see what can be done for Regex.split

03/14/08 12:31:37 changed by jhouse

Thanks Kris. Is it possible to post the output in a readable format? I rolled back to 0.99.4, but if the output looks good, I'll be daring and go to svn version 995.

03/14/08 23:54:37 changed by kris

===
''
''      ''
''      ''      ''
'a'
'a'     ''
'a'     ''      ''
===
''
''
''
''
'a'     ''
'a'     ''
===

03/15/08 06:16:54 changed by jascha

  • status changed from new to closed.
  • resolution set to fixed.

(In [3355]) - split wasn't passing through the input if no match was found (closes #980)

03/15/08 15:42:46 changed by kris

yay \o/

03/15/08 15:43:22 changed by kris

===
''
''      ''
''      ''      ''
'a'
'a'     ''
'a'     ''      ''
===
''
''      ''
''      ''      ''
'a'
'a'     ''
'a'     ''      ''
===