Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #546 (closed defect: fixed)

Opened 1 year ago

Last modified 1 year ago

toInt("0b",16) != 0x0b

Reported by: eao197 Assigned to: kris
Priority: normal Milestone: 0.99.1 RC4
Component: Core Functionality Version: trunk
Keywords: Cc: sean_k, larsivi

Description

Sample code:

import tango.text.convert.Integer;

void main() {
  assert( 0x0b == toInt( "0b", 16 ) );
}

This is because tango.text.convert.trim tries to detect prefix even in the case when radix is already specified.

This defect is annoying because I found it in a program which parses data hexadecimal dumps in format 'xx xx xx xx ...' (like '0b 0a 03 74 d7 ...'). Yes, I know that '0b' is the prefix for binary form numbers. But I think in the case when radix is precisely specified it is not need to skip any prefix symbols in trim().

So I propose to modify toInt/toLong/trim in such way:

// radix == 0 means autodetection of radix.
long toLong(T, U=uint) (T[] digits, U radix=0)
{return toLong!(T)(digits, radix);}

long toLong(T) (T[] digits, uint radix=0)
{
//... same as now ...
}

uint trim(T, U=uint) (T[] digits, inout bool sign, inout U radix)
{return trim!(T)(digits, sign, radix);}

uint trim(T) (T[] digits, inout bool sign, inout uint radix)
{
        T       c;
        T*      p = digits.ptr;
        int     len = digits.length;

        // strip off whitespace and sign characters
        for (c = *p; len; c = *++p, --len)
             if (c is ' ' || c is '\t')
                {}
             else
                if (c is '-')
                    sign = true;
                else
                   if (c is '+')
                       sign = false;
                else
                   break;

        // strip off a radix specifier also?
        if (radix == 0 && c is '0' && len > 1)
            switch (*++p)
                   {
                   case 'x':
                   case 'X':
                        ++p;
                        radix = 16;
                        break;

                   case 'b':
                   case 'B':
                        ++p;
                        radix = 2;
                        break;

                   case 'o':
                   case 'O':
                        ++p;
                        radix = 8;
                        break;

                   default:
                        radix = 10;
                        break;
                   } 

        // return number of characters eaten
        return (p - digits.ptr);
}

In this implementation detection of radix by examining prefix will be skipped is user specified radix argument.

Change History

07/23/07 11:24:52 changed by eao197

Oops! The trim implementation is incorrent. Need fix:

if( radix == 0 && c is '0' && len > 1 )
  switch(...)
    { ... }

if( radix == 0 )
  radix = 0;

return ...

08/02/07 12:46:19 changed by sean

  • owner changed from sean to kris.

08/02/07 13:39:42 changed by kris

  • priority changed from major to normal.
  • status changed from new to assigned.
  • version set to trunk.
  • milestone set to 1.0.

give us a few days to think about it?

08/02/07 14:50:49 changed by eao197

No problems.

08/28/07 03:17:15 changed by kris

  • cc set to sean_k, larsivi.

The radix provided was intended to be a suggested one, where the content does not explicitly provide one. I think the real problem here is that "0b" is a stupid value to use as a radix indicator :)

We'll have to change the binary radix indicator to something that doesn't interfere with hex values, but I can't think of what it should be ...

08/28/07 03:57:01 changed by eao197

I don't think that assuming radix only as suggested value is a good idea. For example:

toInt("0010", 2)

will be 10 (as a decimal value), not 2 (as a binary value).

And is it good idea to check prefix always? It tooks some time. Could it be unefficient for application with massive use of toInt?

08/28/07 04:12:44 changed by kris

the example would result in 2, since there's no radix descriptor on the number itself. Performance is definitely a consideration, but there's truly minor overhead involved here.

A default radix can be very useful. For example, when reading a stream of numbers like this: "10 20 30 0x10 0b0001100 0o777 15"

08/28/07 04:19:28 changed by eao197

A default radix can be very useful. For example, when reading a stream of numbers like this: "10 20 30 0x10 0b0001100 0o777 15"

I agree. But why not to use special value for default radix? If user doesn't change radix value he allows Tango to detect the correct one. But if user specifies radix then he knows what he wants -- there is no need to do additional actions. IMHO.

I don't like changing 0b prefix to someone else, because it is almost de-facto standard for me.

(follow-up: ↓ 10 ) 08/28/07 04:32:58 changed by kris

You perhaps missed the point completely?

What happens in the above example where numbers are declared in a mixed manner, and the user converts using base-10? For instance, parse ("0o777", 10) ??

The explicit specification on the content must take priority, thus the provided radix is purely a default value for when an explicit radix is not present on the content itself.

BTW: I'd never seen 0b used as a prefix before, and thus conjured it up myself in a fit of bad taste :)

(in reply to: ↑ 9 ) 08/28/07 04:48:52 changed by eao197

Replying to kris:

You perhaps missed the point completely?

No :) I suppose you missed my original proposal to change default value of radix from 10 to 0. So:

toInt( "0o777" )

means that toInt must checks prefix.

What happens in the above example where numbers are declared in a mixed manner, and the user converts using base-10? For instance, parse ("0o777", 10) ??

An exception should be here. User could use something like:

foreach( n; split("10 20 30 0x10 0b0001100 0o777 15") )
  toInt( n ); // or toInt( n, 0 );

If he use toInt(n, 10) then he made some assumptions about numbers. And if he made wrong assumtions he must get an exception.

The explicit specification on the content must take priority, thus the provided radix is purely a default value for when an explicit radix is not present on the content itself.

I just don't see reasons to explicitely specify radix argument if it is unknown which forms of number representations are used.

BTW: I'd never seen 0b used as a prefix before, and thus conjured it up myself in a fit of bad taste :)

Prefix 0b is used, at least, in two programming languages: Ruby and Curl (http://www.curl.com).

08/28/07 14:04:20 changed by sean

For what it's worth, I think number formats should match the language spec whenever possible. So 0x for hex, 0b for binary, etc.

The advantage of favoring the number format is that it allows processing of formatted strings. For example, a data file may be written with leading zeroes in some cases, and it should be possible to process this by explicitly telling the routine what the number format is. If the prefix is the preferred indicator, such data could easily be processed as binary, octal, and decimal, based on the encoding and whether a particular number contains leading zeroes.

08/30/07 04:28:28 changed by kris

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [2526]) After extended discussions, Integer.parse() and Integer.trim() have been adjusted such that an optional radix-prefix in the text must match an explicit radix argument (where both are provided). Set the numeric radix argument to zero in order to retain prior behavior (which it defaults to).

closes #546, and thanks to eao197 for the ticket :)

08/31/07 14:17:10 changed by sean

So does this mean that leading zeroes must be trimmed from numbers where a radix is supplied?

08/31/07 15:09:44 changed by kris

no. Leading zero's are fine by themselves. If a leading zero is followed by a radix specifier, such as 0x, then that radix must match the numeric-radix provided to the function (unless the latter is zero) e.g:

convert ("10") == 10
convert ("Ox10") == 16
convert ("0x10", 16) == 16
convert ("10", 16) == 16
convert ("0x10", 10) == 0

In the latter case, the 0x prefix is not consumed (because the radix specifiers do not match) and thus the numeric parser sees 'x' as the first char. This is outside the valid range of the radix, so parsing halts and 0 is returned. Wrappers, such as toInt(), test the number of chars consumed by the parse and will throw an exception in this particular case.

Apparently, Python operates in the same manner.

09/07/07 04:22:27 changed by larsivi

  • milestone changed from 1.0 to 0.99.1 RC4.