VM Serialization

JarrettBillingsley · Joined: 20 Jun 2006 Posts: 457 Location: Pennsylvania!

wm4 on the IRC channel gave me the idea of making it possible to serialize the entire state of the VM - all objects and threads - to a file, and then, of course, being able to deserialize it back to the way it was.

This is simple for many types. I already have it working for null, bool, int, float, char, string, table, array, and namespace. Weak references are somewhat tricky but possible. Threads are also a lot of work but not that complex.

The difficulties really arise with functions, classes, instances, and native objects. The problem is that these types can be created by the host app. You can't very well serialize a native D function, or a class with a custom allocator/finalizer, or any instance of that class, or an arbitrary D object. Furthermore it wouldn't really make sense to serialize stuff from the standard libraries, except maybe the state of the modules lib, but there is no distinction made between "built-in" code and objects and user-generated stuff.

wm4 suggested, for native functions at least, to build a mapping from function objects to names, and when serializing, simply serialize a named reference to the function instead of the function itself. There is a major problem with this: functions do not have names. Even if you were to exhaustively iterate through all module namespaces and map each native function to its fully-qualified name, you could still very easily have functions in places where there is no obvious name (i.e. as an element of an array). The name given to functions when created by newFunction does not have to be unique, is not really even required, and is only meant for debugging purposes. Function references can also be copied around, so even if it's living in foo.bar.f in one module, it could be a class's member function in an entirely separate module.

The problem boils down to the fact that the VM's state is not self-contained. It is, by definition, intertwined with the host application's state. There is no distinction between what is part of the host state and what is part of the MiniD VM state, which makes it impossible to tell what should be serialized and what shouldn't.

The only thing I've come up with so far is to have the serialization framework accept some kind of "decider" function from the user which would decide whether to output the given object or whether to simply output some kind of unique name, placing the onus of hooking those names up during deserialization on the user. The annoying part about this is that you wouldn't really want it to bug you about all the standard libraries, so there'd have to be some second mechanism to "block out" entire modules/packages, forcing them to always be name-referenced. Taken to an extreme, this could also be used to simply output a small graph of objects.

Finally, to be honest, VM serialization sounds interesting, but I just can't shake the feeling that it's killing a mosquito with a bazooka. Blah, dump all state, and bam, reload it and your app is just as it was before - it's tempting, but as with any problem specified at an extremely high level, there are far more snags than you'd think. What I wonder is if there are better options. If you were to keep all your "nonvolatile" state in a certain place, you could serialize just what you need to, which would not only be simpler, but faster and smaller too.

I don't know; what are other peoples' thoughts?

csauls · Joined: 27 Mar 2004 Posts: 278

I think, as you suggest at the end, keeping any state desirable for serialization in one lump, or at least in a handful of easily reached places, is ideal. A full dump/load capability is tempting, would be awesome in some ways, but is really just overkill...

That said, standardized ways of serializing common things is nice. An interface for serializable objects would also be nice (a la PHP's __sleep() and __wakeup() magic methods, for example). You can provide the fundamentals, and leave the details to the user, which makes for better things all around.
_________________
Chris Nicholson-Sauls

JarrettBillingsley · Joined: 20 Jun 2006 Posts: 457 Location: Pennsylvania!

I'm back on this. wm4 made me aware of Pluto, a very similar library for Lua, which has given me much more motivation after seeing (1) how relatively simple it is, (2) how similar my approach was to theirs, making me think I'm doin it rite, and (3) how they dealt with these complex issues.

I've just got upvalues serializing and deserializing correctly, and it's pretty sweet. The harder parts are still to come, but so far it's going well.

JarrettBillingsley · Joined: 20 Jun 2006 Posts: 457 Location: Pennsylvania!

Woooo basic threads

JarrettBillingsley · Joined: 20 Jun 2006 Posts: 457 Location: Pennsylvania!

I never posted an update on this, I just made comments in the commit log instead. But yeah, serialization.. basically works now. So. I need to write docs for it, and threads can't be serialized in Extended coroutine mode.. hm.. might have to bring back the three-level scheme for coroutine support after all :S