View previous topic :: View next topic |
Author |
Message |
baxissimo
Joined: 23 Oct 2006 Posts: 241 Location: Tokyo, Japan
|
Posted: Sun Apr 20, 2008 11:53 pm Post subject: JSON Parser bug & fix |
|
|
The json parser just ignores any \ characters and the character following. So this in a json string "c:\my\dir" gets read in as "c:yir". And "C:\\my\\dir"" is read as "C:mydir".
This in json/Parser.d:229 fixes it somewhat:
Code: |
// String
else if (c == '"') {
bool inEsc = false;
while (true) {
getch();
if (inEsc) {
switch(c) {
case '\\': c='\\'; break;
case '\'': c='\''; break;
case 'n': c='\n'; break;
case 'r': c='\r'; break;
case 'a': c='\a'; break;
case 'b': c='\b'; break;
case 'f': c='\f'; break;
case 't': c='\t'; break;
case 'v': c='\v'; break;
default:
// TODO: handle more escape codes
error("Unknown/unsupported escape sequence");
}
buf ~= c;
inEsc = false;
}
else {
if (c == '\\') {
inEsc = true;
continue;
}
else if (c == '"') {
if (source[index - 1] != '\\') {
break;
}
}
else if (c == EOF) {
error("unterminated string literal");
}
else {
if (c == '\n') {
++line;
}
buf ~= c;
}
}
}
tokValue.s = buf;
return TOK_STRING;
}
|
I say somewhat because it doesn't handle the following from the D grammar:
Code: |
\ EndOfFile
\x HexDigit HexDigit
\ OctalDigit
\ OctalDigit OctalDigit
\ OctalDigit OctalDigit OctalDigit
\u HexDigit HexDigit HexDigit HexDigit
\U HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit
\& NamedCharacterEntity ;
|
But at least it lets you put windows files paths in your json strings! |
|
Back to top |
|
|
baxissimo
Joined: 23 Oct 2006 Posts: 241 Location: Tokyo, Japan
|
Posted: Mon Apr 21, 2008 12:24 am Post subject: then again.. |
|
|
Actually, I forgot -- it doesn't need to support the D lexical conventions for string literals, it needs to support JSON's. Those are much more limited -- not &namedEntity; things and such. So here's the version that actually supports JSON's escape sequences in their entirety:
Code: |
// String
else if (c == '"') {
bool inEsc = false;
while (true) {
getch();
if (inEsc) {
switch(c) {
case '"': c='"'; break;
case '\\': c='\\'; break;
case '/': c='/'; break;
case 'b': c='\b'; break;
case 'f': c='\f'; break;
case 'n': c='\n'; break;
case 'r': c='\r'; break;
case 't': c='\t'; break;
case 'u': {
// 4 hex digits
uint val=0;
for(int i=0;i<4; i++) {
val <<= 4;
getch();
if (c >= '0' && c <= '9') {
val += c-'0';
}
else if (c >= 'a' && c <= 'f') {
val += 10 + c-'a';
}
else if (c >= 'A' && c <= 'F') {
val += 10 + c-'A';
}
else if (c == EOF) {
error("unterminated string literal");
}
else {
error("Non hex digit inside \\uXXXX escape sequence");
}
}
c = cast(dchar)val;
} break;
default:
// TODO: handle more escape codes
error("Illegal JSON escape sequence");
}
buf ~= c;
inEsc = false;
}
else {
if (c == '\\') {
inEsc = true;
continue;
}
else if (c == '"') {
break;
}
else if (c == EOF) {
error("unterminated string literal");
}
else {
if (c == '\n') {
++line;
}
buf ~= c;
}
}
}
|
... except the \\ sequence is the only one I've actually tested.
The \uXXXX support very well might have a bug in it. |
|
Back to top |
|
|
baxissimo
Joined: 23 Oct 2006 Posts: 241 Location: Tokyo, Japan
|
Posted: Mon Apr 21, 2008 10:07 am Post subject: And another bug |
|
|
The parser chokes on negative numbers, too.
The fix is really easy, though. In Parser.d right after the comment "// Number" appears, change
Code: |
else if (Uni.isDigit(c)) {
|
to this
Code: |
else if (Uni.isDigit(c) || c == '-') {
|
|
|
Back to top |
|
|
csauls
Joined: 27 Mar 2004 Posts: 278
|
Posted: Fri May 09, 2008 7:12 am Post subject: |
|
|
Sorry I didn't see this until now. Thanks! And I'll fold those in tonight when I get back home. _________________ Chris Nicholson-Sauls |
|
Back to top |
|
|
csauls
Joined: 27 Mar 2004 Posts: 278
|
Posted: Fri May 09, 2008 2:37 pm Post subject: |
|
|
Done. _________________ Chris Nicholson-Sauls |
|
Back to top |
|
|
|