View previous topic :: View next topic |
Author |
Message |
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Wed Feb 07, 2007 12:32 am Post subject: SAX Parser |
|
|
I have just uploaded the new and improved SAX parser and writer to Mango's SVN. It's Tango compatible, and I think it runs pretty damn fast. (Actually, the writer is pretty slow. I know how to speed it up, but don't particularly care right now.) On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s.
The API has changed quite a bit both to accommodate Tango and I took out some things I decided were unnecessary- StringView and Unistring. Now, just character arrays are used. The parser is templated anyway, and wrapping the strings in the string classes just adds code and time- the client code can do it if it wants to. I also changed the way the xml writer works- it uses UnicodeBom to do all of the transcoding work now. I also templated it to make it easier and more straightforward. I apologize now for the code changes you'll have to make, but I figure you're porting to Tango anyway, so this is the time to make API changes.
Have fun, and please compare performance with others. I'm interested to see how I stack up.
~John |
|
Back to top |
|
|
dubuila
Joined: 22 Aug 2006 Posts: 28
|
Posted: Wed Feb 07, 2007 1:49 am Post subject: |
|
|
Hello,
Thanks for the porting.
I'm just finished my first comprehension of the existing codes
My code is small I will port it directly to the latest version.
I plan to implement a performance measurement in my framework.
I will let you know of the results.
Are you 100Mb real life sample or build simple test ?
Would you be interested to build a general XML sample test set cross application. I think something already exist like 'TestME' but not sure.
Regards,
Laurent. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Feb 07, 2007 2:15 am Post subject: Re: SAX Parser |
|
|
teqdruid wrote: | I have just uploaded the new and improved SAX parser and writer to Mango's SVN. It's Tango compatible, and I think it runs pretty damn fast |
Nice! |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Wed Feb 07, 2007 11:45 am Post subject: |
|
|
dubuila wrote: | Hello,
Thanks for the porting.
I'm just finished my first comprehension of the existing codes
My code is small I will port it directly to the latest version.
I plan to implement a performance measurement in my framework.
I will let you know of the results.
Are you 100Mb real life sample or build simple test ? |
Great- I look forward to hearing the results. The 100MB sample I used was not real life sample- this is why I said that the numbers I gave were pretty meaningless. I've been generating samples with the genBigXml example in examples/xml/sax - as you can see, it generates some pretty stupid files.
Quote: | Would you be interested to build a general XML sample test set cross application. I think something already exist like 'TestME' but not sure.
|
I would be interested in having something like this, but not in building it myself. I'm not actually all the interested in XML parsing- it's just something I need- so now that the parser is working relatively well, I'm planning on moving on. In particular, Xml-Rpc is in my sights next. I'm sure there are a few bugs in the parser, though, so I'll be around to fix them up.
~John |
|
Back to top |
|
|
dubuila
Joined: 22 Aug 2006 Posts: 28
|
Posted: Wed Feb 07, 2007 5:08 pm Post subject: |
|
|
Hello,
I moved my files but I get trouble with .obj.
Sorry I'm quite begining in .d and debugging .obj is still strange for me :
I already experienced problem of reference within Tango with the Convert.d and Exception.d from the lib/common
Could you give me some advice on the way to search on ?
(source code on http://hyridia.svn.sourceforge.net/svnroot/hyridia/ )
(It is just the beginning of the code )
++++++++++++
Command >>> Building Project: crk......
C:\dmd\bin\bud_win_3.04.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\tango;C:\Coding\lib\mango\trunk;C:\Coding\lib\tango\lib\common
OPTLINK (R) for Win32 Release 7.50B1
Copyright (C) Digital Mars 1989 - 2001 All Rights Reserved
InputReader.obj(InputReader)
Error 42: Symbol Undefined _D6object6Object6toUtf8MFZAa
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception5_ctorMFAaC9ExceptionZC9Exception
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception6toUtf8MFZAa
C:\Coding\lib\tango\tango\io\FileProxy.obj(FileProxy)
Error 42: Symbol Undefined _D5tango3sys5win325Types16WIN32_FIND_DATAW6__initZ
C:\Coding\lib\tango\lib\common\tango\core\Thread.obj(Thread)
Error 42: Symbol Undefined _cr_stackBottom
Finished
++++++++++++++ |
|
Back to top |
|
|
dubuila
Joined: 22 Aug 2006 Posts: 28
|
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Wed Feb 07, 2007 9:18 pm Post subject: |
|
|
dubuila wrote: | Hello,
I moved my files but I get trouble with .obj.
Sorry I'm quite begining in .d and debugging .obj is still strange for me :
I already experienced problem of reference within Tango with the Convert.d and Exception.d from the lib/common
Could you give me some advice on the way to search on ?
(source code on http://hyridia.svn.sourceforge.net/svnroot/hyridia/ )
(It is just the beginning of the code )
++++++++++++
Command >>> Building Project: crk......
C:\dmd\bin\bud_win_3.04.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\tango;C:\Coding\lib\mango\trunk;C:\Coding\lib\tango\lib\common
OPTLINK (R) for Win32 Release 7.50B1
Copyright (C) Digital Mars 1989 - 2001 All Rights Reserved
InputReader.obj(InputReader)
Error 42: Symbol Undefined _D6object6Object6toUtf8MFZAa
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception5_ctorMFAaC9ExceptionZC9Exception
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception6toUtf8MFZAa
C:\Coding\lib\tango\tango\io\FileProxy.obj(FileProxy)
Error 42: Symbol Undefined _D5tango3sys5win325Types16WIN32_FIND_DATAW6__initZ
C:\Coding\lib\tango\lib\common\tango\core\Thread.obj(Thread)
Error 42: Symbol Undefined _cr_stackBottom
Finished
++++++++++++++ |
This is kinda of a shot in the dark, but it sorta looks like you're trying to link against the phobos.lib that comes with dmd instead of the phobos.lib that comes with Tango. Have you been able to sucessfully compile anything with Tango instead of Phobos? |
|
Back to top |
|
|
dubuila
Joined: 22 Aug 2006 Posts: 28
|
Posted: Thu Feb 08, 2007 3:51 am Post subject: |
|
|
Hello,
Thanks.
You have right ! I miss the Tango installation and phobos.lib changes.
I download the latest revision and use the Tango installer.
I got :
Command >>> Building Project: crk......
C:\dmd\bin\build.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\mango\trunk
C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(1083): function tango.text.convert.UnicodeBom.UnicodeBom!(char).UnicodeBom.decode (void[],char[]) does not match parameter types (void[],char[],uint*)
C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(1083): Error: expected 2 arguments, not 3
C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(137): template instance mango.xml.sax.parser.teqXML.TeqXMLReader!(char) error instantiating
To solve it I moved the latest revision to replace the files in :
C:\dmd\tango\tango
Then I got :
Command >>> Building Project: crk......
C:\dmd\bin\build.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\mango\trunk
C:\dmd\tango\tango\io\Buffer.d(16): module Exception cannot read file 'tango\core\Exception.d'
To solve it :
I moved the tango\lib\common\tango\core\Exception.d file to C:\dmd\tango\tango\core folder.
I moved the tango\lib\common\tango\core\Thread.d file to C:\dmd\tango\tango\core folder.
Why is exception outside the main trunk ?
An it is now running.
Would it be possible to have a config file in the Tango setup to allow to point to an alternate dir for C:\dmd\tango in order to point to a custom svn folder ?
Regards,
Laurent. |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Thu Feb 08, 2007 3:32 pm Post subject: |
|
|
dubuila wrote: |
Would it be possible to have a config file in the Tango setup to allow to point to an alternate dir for C:\dmd\tango in order to point to a custom svn folder ?
|
I think it's just a matter of where you have your dmd.conf (sc.ini for windows I think) pointed at.
~John |
|
Back to top |
|
|
stonecobra
Joined: 25 May 2004 Posts: 48 Location: Rough and Ready, CA
|
Posted: Wed Jun 27, 2007 1:33 pm Post subject: Re: SAX Parser |
|
|
teqdruid wrote: | On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s. |
John, I just tested with Mango trunk and tango 0.98, and I get 50MB/sec for your genbigxml file, but only 25MB/sec if I trim out the whitespace via pretty=false in genbigxml.d. It turns out that your genbigxml.d is generating an xml file with about 75% whitespace. This is using a buffer instead of just a fileconduit, if I use a file conduit, I am only seeing about 14MB/sec. All this on a thinkpad t60p dual core with a poor HDD |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Wed Jun 27, 2007 2:08 pm Post subject: Re: SAX Parser |
|
|
stonecobra wrote: | teqdruid wrote: | On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s. |
John, I just tested with Mango trunk and tango 0.98, and I get 50MB/sec for your genbigxml file, but only 25MB/sec if I trim out the whitespace via pretty=false in genbigxml.d. It turns out that your genbigxml.d is generating an xml file with about 75% whitespace. This is using a buffer instead of just a fileconduit, if I use a file conduit, I am only seeing about 14MB/sec. All this on a thinkpad t60p dual core with a poor HDD |
Thanks for the input! Like I've said previously, I haven't really done a good speed benchmark, and the numbers I throw out don't mean much.
I'm not surprised to see performance drop so much in very tag-dense XML. Without the pretty=true, then genbigxml.d outputs very tag-dense XML- about 100%, I guess. So, the processor has to do a helluva lot more work, and has to make many, many more virtual function calls (per MB) to the handler. I bet that the parser does pretty well on text-rich XML: low tag-density xml, that is. It never allocates memory, and uses intel string scan instructions to fly-by a lot of the text.
I'm a bit surprised the buffer helps so much- there's an internal buffer in the parser, so I wouldn't think that double-buffering it would help that much. Maybe I'm not interfacing with the conduit as efficiently as I should be. I'll take a look at it next time I go wading through the parser- which I've been meaning to do ever since the last Tango IO update.
As for the your numbers in an absolute sense- do you think these are good numbers? You mentioned a crappy HDD, but if the file's not too big, and you've got some RAM, the file is probably cached anyway- the second time around, at least. Have you tried any other parsers with this data, by any chance?
Thanks again,
John |
|
Back to top |
|
|
|