FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Binary serializer?

 
Post new topic   Reply to topic     Forum Index -> Doost
View previous topic :: View next topic  
Author Message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Mon Jun 23, 2008 2:10 am    Post subject: Binary serializer? Reply with quote

Is there any fundamental reason why there couldn't be a doost.util.serializer.archive.BinaryArchive?

When dealing with images and things like that you don't really want to encode them in utf-8.

Or is it possible to stick binary blobs of data into the exising serializers already?
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Mon Jun 23, 2008 4:36 pm    Post subject: Reply with quote

There is no such reason. I was designing serializer with binary archives in mind and it should work for any type of storage which can be presented as array of same elements.

Binary archive is just not written yet...

Adding binary support should be quite easy after defining binary format and using e.g. TextArchive as starting point. If someone (you? Smile ) implement such an archive it can be certainly committed to doost repository.

Unfortunately I am quite busy now, so I can not promise anything ...
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Mon Jun 23, 2008 10:55 pm    Post subject: Reply with quote

aarti_pl wrote:
There is no such reason. I was designing serializer with binary archives in mind and it should work for any type of storage which can be presented as array of same elements.

Binary archive is just not written yet...

Adding binary support should be quite easy after defining binary format and using e.g. TextArchive as starting point. If someone (you? Smile ) implement such an archive it can be certainly committed to doost repository.

Unfortunately I am quite busy now, so I can not promise anything ...


That sounds encouraging. I'll probably end up giving it a try since your serializer definitely seems to be the most mature solution out there.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Tue Jun 24, 2008 3:34 pm    Post subject: Reply with quote

Great! Binary serializer will be valuable addition...
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Wed Jun 25, 2008 3:21 am    Post subject: Reply with quote

I'm making good progress on this.

One question: should I put delimiters in the binary stream? I was thinking first they weren't necessary, but then I started to realize that putting things like arraybegin/arrayend codes in the binary stream would be good for catching improperly formatted data streams.

Another question: I need a set of routines for turning various types into properly byte-swapped ubyte[]s. Do you have any recommendation about where to get such routines? Or should I just put some in doost? So far I just put the ones I wrote into a doost.util.serializer.Binary module, but I don't think that's really the ideal place for them.


Last edited by baxissimo on Wed Jun 25, 2008 4:53 am; edited 1 time in total
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Wed Jun 25, 2008 4:51 am    Post subject: Reply with quote

Well, good question.

I was thinking about goal of such a "security" extension;

Obvious drawbacks of adding additional marks:
- overhead on serialized data size (proper solution should mark every serialized type)
- overhead on execution time

I think that there might be following use cases for such a feature:

1. Detecting corruptions of data e.g. coming from disk, network etc.
- probability of such corruptions is (IMHO) rather small nowadays
- I am not sure if other parts of solution are reliable enough to allow NASA space ships to use Doost serializer Smile
- corruptions can be probably better detected with some kind of hashes. They might be calculated for whole blob of data and then compared to checksum stored at the beginning of file.

2. Assuring that user deserializes proper data not just random blob of bytes
- should not happen so often Smile
- serializer most probably will fail anyway with wrong data input as there will be improper length of data
- should be enough to mark serialized stream of bytes just at the beginning with e.g. doost.serializer.

What do you think?

-----------------------

Second question:

It depends if they can be reused. If yes they probably will need more prominent place.

Maybe it will be easier for me to decide when I will see what will be inside. You can e-mail me current file and I will think about package name in the evening.

Personally I like rather flat hierarchies with packages named after 'area' of programming.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Wed Jun 25, 2008 5:52 am    Post subject: Reply with quote

aarti_pl wrote:
Well, good question.

I was thinking about goal of such a "security" extension;

Obvious drawbacks of adding additional marks:
- overhead on serialized data size (proper solution should mark every serialized type)
- overhead on execution time

I think that there might be following use cases for such a feature:

1. Detecting corruptions of data e.g. coming from disk, network etc.
- probability of such corruptions is (IMHO) rather small nowadays
- I am not sure if other parts of solution are reliable enough to allow NASA space ships to use Doost serializer Smile
- corruptions can be probably better detected with some kind of hashes. They might be calculated for whole blob of data and then compared to checksum stored at the beginning of file.

2. Assuring that user deserializes proper data not just random blob of bytes
- should not happen so often Smile
- serializer most probably will fail anyway with wrong data input as there will be improper length of data
- should be enough to mark serialized stream of bytes just at the beginning with e.g. doost.serializer.

What do you think?

-----------------------

Second question:

It depends if they can be reused. If yes they probably will need more prominent place.

Maybe it will be easier for me to decide when I will see what will be inside. You can e-mail me current file and I will think about package name in the evening.

Personally I like rather flat hierarchies with packages named after 'area' of programming.


About inserting redundancy checks in the datastream, I think it's a good idea because if you're loading a huge file, you really don't want to have to load the whole thing just to find out you don't have any more data. Just marking the sequence types and checking that they're nested properly shouldn't add that much overhead (just an extra 2 bytes per sequence). I suppose I could just make it an option of the BinarySerializer whether to use that.


About binary conversions module -- basically right now it consists of various versions of the functions "toBinary()" and "fromBinary()", and a swapBytes routine. But I think it will be more efficient to make a BinaryConverter class with a built-in buffer for small conversions up to, say, creal.sizeof. Plus in a class, the decision to swap bytes or not can be cached for the duration of a serialize/deserialize session (based say on an initial byte order marker and the endianness of the current platform).

Seems to me to be most similar to Storage, among the various parts of doost.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Wed Jun 25, 2008 6:38 am    Post subject: Reply with quote

Ok. Partial checks might be indeed good for big binary files. I think that they should be added to all complex types: arrays, associative arrays, classes, structs etc. It will be good compromise between safety and overhead.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Thu Jun 26, 2008 2:39 pm    Post subject: Check in. Reply with quote

The BinaryArchive is now checked into trunk, and tests have been added to FunctionTests.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Thu Jun 26, 2008 3:09 pm    Post subject: Reply with quote

Nice!

I will announce changes, after adding some more functionality to library. It won't be missed. Smile
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Thu Jun 26, 2008 3:23 pm    Post subject: Reply with quote

Cool. I think I'm going to deal with the trouble over reals for now by just truncating them to 64bits (double). It's not ideal, but I don't like the idea of a binary format that isn't compatible between architectures.

The ultimate fix I suppose would be to save reals in some architecture-independent variable-precision floating point format that is guaranteed to be able to represent a real exactly no matter what the native format of it is.

But that's more than I have the time or inclination to work on right now. In any event reals are defined to be architecture dependent, so it doesn't make much sense for someone to try to save them.

I guess another approach could be just to leave it as is and say "Use at your own risk". You shouldn't save reals because they're architecture dependent, but if you know you're saving and loading on the same architecture you can. Otherwise it will just fail in horrible ways.

So three approaches overall in order of complexity:
    1. Save and load reals as-is, expecting them to be native, and if not, well too bad
    2. Save and load reals as doubles, and accept the inevitable truncation of precision.
    3. Implement a multi-precision architecture independent float, and load/save that.


[edit]
For now I've gone ahead and implemented #2.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Thu Jun 26, 2008 5:20 pm    Post subject: Reply with quote

I also think that this is best solution for now...
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Doost All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group