FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

SAX Parser

 
Post new topic   Reply to topic     Forum Index -> Mango
View previous topic :: View next topic  
Author Message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Wed Feb 07, 2007 12:32 am    Post subject: SAX Parser Reply with quote

I have just uploaded the new and improved SAX parser and writer to Mango's SVN. It's Tango compatible, and I think it runs pretty damn fast. (Actually, the writer is pretty slow. I know how to speed it up, but don't particularly care right now.) On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s.

The API has changed quite a bit both to accommodate Tango and I took out some things I decided were unnecessary- StringView and Unistring. Now, just character arrays are used. The parser is templated anyway, and wrapping the strings in the string classes just adds code and time- the client code can do it if it wants to. I also changed the way the xml writer works- it uses UnicodeBom to do all of the transcoding work now. I also templated it to make it easier and more straightforward. I apologize now for the code changes you'll have to make, but I figure you're porting to Tango anyway, so this is the time to make API changes.

Have fun, and please compare performance with others. I'm interested to see how I stack up.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
dubuila



Joined: 22 Aug 2006
Posts: 28

PostPosted: Wed Feb 07, 2007 1:49 am    Post subject: Reply with quote

Hello,

Thanks for the porting.

I'm just finished my first comprehension of the existing codes Smile

My code is small I will port it directly to the latest version.

I plan to implement a performance measurement in my framework.

I will let you know of the results.

Are you 100Mb real life sample or build simple test ?

Would you be interested to build a general XML sample test set cross application. I think something already exist like 'TestME' but not sure.

Regards,
Laurent.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Wed Feb 07, 2007 2:15 am    Post subject: Re: SAX Parser Reply with quote

teqdruid wrote:
I have just uploaded the new and improved SAX parser and writer to Mango's SVN. It's Tango compatible, and I think it runs pretty damn fast

Nice!
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Wed Feb 07, 2007 11:45 am    Post subject: Reply with quote

dubuila wrote:
Hello,

Thanks for the porting.

I'm just finished my first comprehension of the existing codes Smile

My code is small I will port it directly to the latest version.

I plan to implement a performance measurement in my framework.

I will let you know of the results.

Are you 100Mb real life sample or build simple test ?


Great- I look forward to hearing the results. The 100MB sample I used was not real life sample- this is why I said that the numbers I gave were pretty meaningless. I've been generating samples with the genBigXml example in examples/xml/sax - as you can see, it generates some pretty stupid files.

Quote:
Would you be interested to build a general XML sample test set cross application. I think something already exist like 'TestME' but not sure.


I would be interested in having something like this, but not in building it myself. I'm not actually all the interested in XML parsing- it's just something I need- so now that the parser is working relatively well, I'm planning on moving on. In particular, Xml-Rpc is in my sights next. I'm sure there are a few bugs in the parser, though, so I'll be around to fix them up.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
dubuila



Joined: 22 Aug 2006
Posts: 28

PostPosted: Wed Feb 07, 2007 5:08 pm    Post subject: Reply with quote

Hello,

I moved my files but I get trouble with .obj.

Sorry I'm quite begining in .d and debugging .obj is still strange for me :

I already experienced problem of reference within Tango with the Convert.d and Exception.d from the lib/common

Could you give me some advice on the way to search on ?

(source code on http://hyridia.svn.sourceforge.net/svnroot/hyridia/ )
(It is just the beginning of the code Smile)

++++++++++++
Command >>> Building Project: crk......
C:\dmd\bin\bud_win_3.04.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\tango;C:\Coding\lib\mango\trunk;C:\Coding\lib\tango\lib\common


OPTLINK (R) for Win32 Release 7.50B1
Copyright (C) Digital Mars 1989 - 2001 All Rights Reserved

InputReader.obj(InputReader)
Error 42: Symbol Undefined _D6object6Object6toUtf8MFZAa
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception5_ctorMFAaC9ExceptionZC9Exception
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception6toUtf8MFZAa
C:\Coding\lib\tango\tango\io\FileProxy.obj(FileProxy)
Error 42: Symbol Undefined _D5tango3sys5win325Types16WIN32_FIND_DATAW6__initZ
C:\Coding\lib\tango\lib\common\tango\core\Thread.obj(Thread)
Error 42: Symbol Undefined _cr_stackBottom


Finished
++++++++++++++
Back to top
View user's profile Send private message
dubuila



Joined: 22 Aug 2006
Posts: 28

PostPosted: Wed Feb 07, 2007 5:10 pm    Post subject: Reply with quote

Next in a second post Smile

Thanks for the work on the Utf conversion. Will be quite more easier to use.

At the time I just find rapidly this site as sample for XML:

http://www.cs.washington.edu/research/xmldatasets/www/repository.html
http://www.xml-benchmark.org/
http://xml.ascc.net/test/

I'm not sure how to correctly set-up a benchmark but I will move on. I need to do this for my project and it will use mango/tango than I hope it could help here Smile

Regards,
Laurent.
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Wed Feb 07, 2007 9:18 pm    Post subject: Reply with quote

dubuila wrote:
Hello,

I moved my files but I get trouble with .obj.

Sorry I'm quite begining in .d and debugging .obj is still strange for me :

I already experienced problem of reference within Tango with the Convert.d and Exception.d from the lib/common

Could you give me some advice on the way to search on ?

(source code on http://hyridia.svn.sourceforge.net/svnroot/hyridia/ )
(It is just the beginning of the code Smile)

++++++++++++
Command >>> Building Project: crk......
C:\dmd\bin\bud_win_3.04.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\tango;C:\Coding\lib\mango\trunk;C:\Coding\lib\tango\lib\common


OPTLINK (R) for Win32 Release 7.50B1
Copyright (C) Digital Mars 1989 - 2001 All Rights Reserved

InputReader.obj(InputReader)
Error 42: Symbol Undefined _D6object6Object6toUtf8MFZAa
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception5_ctorMFAaC9ExceptionZC9Exception
C:\Coding\lib\tango\lib\common\tango\core\Exception.obj(Exception)
Error 42: Symbol Undefined _D6object9Exception6toUtf8MFZAa
C:\Coding\lib\tango\tango\io\FileProxy.obj(FileProxy)
Error 42: Symbol Undefined _D5tango3sys5win325Types16WIN32_FIND_DATAW6__initZ
C:\Coding\lib\tango\lib\common\tango\core\Thread.obj(Thread)
Error 42: Symbol Undefined _cr_stackBottom


Finished
++++++++++++++


This is kinda of a shot in the dark, but it sorta looks like you're trying to link against the phobos.lib that comes with dmd instead of the phobos.lib that comes with Tango. Have you been able to sucessfully compile anything with Tango instead of Phobos?
Back to top
View user's profile Send private message Send e-mail AIM Address
dubuila



Joined: 22 Aug 2006
Posts: 28

PostPosted: Thu Feb 08, 2007 3:51 am    Post subject: Reply with quote

Hello,

Thanks.

You have right ! I miss the Tango installation and phobos.lib changes.

I download the latest revision and use the Tango installer.

I got :

Command >>> Building Project: crk......
C:\dmd\bin\build.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\mango\trunk

C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(1083): function tango.text.convert.UnicodeBom.UnicodeBom!(char).UnicodeBom.decode (void[],char[]) does not match parameter types (void[],char[],uint*)
C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(1083): Error: expected 2 arguments, not 3
C:\Coding\lib\mango\trunk\mango\xml\sax\parser\teqXML.d(137): template instance mango.xml.sax.parser.teqXML.TeqXMLReader!(char) error instantiating

To solve it I moved the latest revision to replace the files in :

C:\dmd\tango\tango

Then I got :


Command >>> Building Project: crk......
C:\dmd\bin\build.exe Main.d -D -O -V -V -cleanup -nolib -noautoimport -DCPATHC:\dmd\bin -Tcrk.exe -IC:\Coding\Hyridia;C:\Coding\lib\mango\trunk

C:\dmd\tango\tango\io\Buffer.d(16): module Exception cannot read file 'tango\core\Exception.d'

To solve it :

I moved the tango\lib\common\tango\core\Exception.d file to C:\dmd\tango\tango\core folder.
I moved the tango\lib\common\tango\core\Thread.d file to C:\dmd\tango\tango\core folder.

Why is exception outside the main trunk ?

An it is now running.

Would it be possible to have a config file in the Tango setup to allow to point to an alternate dir for C:\dmd\tango in order to point to a custom svn folder ?

Regards,
Laurent.
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Thu Feb 08, 2007 3:32 pm    Post subject: Reply with quote

dubuila wrote:

Would it be possible to have a config file in the Tango setup to allow to point to an alternate dir for C:\dmd\tango in order to point to a custom svn folder ?


I think it's just a matter of where you have your dmd.conf (sc.ini for windows I think) pointed at.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
stonecobra



Joined: 25 May 2004
Posts: 48
Location: Rough and Ready, CA

PostPosted: Wed Jun 27, 2007 1:33 pm    Post subject: Re: SAX Parser Reply with quote

teqdruid wrote:
On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s.


John, I just tested with Mango trunk and tango 0.98, and I get 50MB/sec for your genbigxml file, but only 25MB/sec if I trim out the whitespace via pretty=false in genbigxml.d. It turns out that your genbigxml.d is generating an xml file with about 75% whitespace. This is using a buffer instead of just a fileconduit, if I use a file conduit, I am only seeing about 14MB/sec. All this on a thinkpad t60p dual core with a poor HDD Smile
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Wed Jun 27, 2007 2:08 pm    Post subject: Re: SAX Parser Reply with quote

stonecobra wrote:
teqdruid wrote:
On my computer, it's able to parse a 100MB XML file off disk in 1.35 seconds == 74MB/s, which doesn't actually mean much, but seems like a nice high number to me. The code to generate the file and run the timed test is in the examples area. Parsing smaller files gets comparable speed ~ 64MB/s.


John, I just tested with Mango trunk and tango 0.98, and I get 50MB/sec for your genbigxml file, but only 25MB/sec if I trim out the whitespace via pretty=false in genbigxml.d. It turns out that your genbigxml.d is generating an xml file with about 75% whitespace. This is using a buffer instead of just a fileconduit, if I use a file conduit, I am only seeing about 14MB/sec. All this on a thinkpad t60p dual core with a poor HDD Smile


Thanks for the input! Like I've said previously, I haven't really done a good speed benchmark, and the numbers I throw out don't mean much.

I'm not surprised to see performance drop so much in very tag-dense XML. Without the pretty=true, then genbigxml.d outputs very tag-dense XML- about 100%, I guess. So, the processor has to do a helluva lot more work, and has to make many, many more virtual function calls (per MB) to the handler. I bet that the parser does pretty well on text-rich XML: low tag-density xml, that is. It never allocates memory, and uses intel string scan instructions to fly-by a lot of the text.

I'm a bit surprised the buffer helps so much- there's an internal buffer in the parser, so I wouldn't think that double-buffering it would help that much. Maybe I'm not interfacing with the conduit as efficiently as I should be. I'll take a look at it next time I go wading through the parser- which I've been meaning to do ever since the last Tango IO update.


As for the your numbers in an absolute sense- do you think these are good numbers? You mentioned a crappy HDD, but if the file's not too big, and you've got some RAM, the file is probably cached anyway- the second time around, at least. Have you tried any other parsers with this data, by any chance?

Thanks again,
John
Back to top
View user's profile Send private message Send e-mail AIM Address
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Mango All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group