FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Status Log
Goto page 1, 2, 3 ... 10, 11, 12  Next
 
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries
View previous topic :: View next topic  
Author Message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Thu Aug 11, 2005 11:23 am    Post subject: Status Log Reply with quote

There's a connection between binutils, the functionality that is needed, and a total lack of an open* OMF loader. On the spectrum of possible solutions, between "just use GDC" and "don't bother", we have multiple combinations of porting binutils/BFD to D, adding an OMF backend to binutils, composing an omf to coff converter, composing an omf to elf converter, and composing a BFD-style solution from scratch. To that end, there's a fair amount of information to digest.

Its also important to note that the Intel OMF format is related to the Microsoft OMF format, but the latter is produced by DMD/DMC. It just so happens that finding documentation for this variant was a little trickier than the former; thankfully I was able to find a few documents online.



There are countless other webpages, and digitalmars newsgroup posts that didn't make the list. This is the cream of the crop thus far.

* - No, "open" watcom doesn't cut it - which is a shame since optlink is likely running on some of the same ancestral code. Their license is utterly incompatible with BSD or GPL license schemes, and is embarrasingly restrictive

** - ELF is a far superior format to COFF or OMF simply becuase the same file is used for executables, libraries and pre-link objects with no extra headers (unlike COFF). The only exception is that in order to create a shared object, the code *must* be relocatable as GCC doesn't do this by default. Libtool provides the means to change an ELF's reloacatable status. There's probably some other voodoo that libtool must be doing as well, such as adding a bootstrap or somesuch; I'm still not clear what it does.

*** - As a loyal servant, it pledges its allegance and promises to recuit other object files to toil endlessly as slaves in BFD's dymamically loadable caverns.
_________________
-- !Eric.t.Anderton at gmail


Last edited by pragma on Sat Jan 14, 2006 7:34 am; edited 4 times in total
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Thu Aug 11, 2005 7:45 pm    Post subject: Reply with quote

It looks like that binutils/bfd is jam-packed with all kinds of cross-platform and legacy code: easily in excess of 400 files. I has a set of headers to steer gcc down the right path for nearly every combination of plaform and distribution imaginable. Its amazing.

Its pretty straightforward what the minimal file set is for windows. Just ignore everything without '386', 'bfd', 'binutil' or 'coff' in the filename and you're good to go. EDLL is composed of *two* files, so I just add those to the heap for my reading list (BFD indeed).

So at least I have a working model for runtime linking, however obscure.

The way I see it, there are two potential development paths, each with their own consequences.

(+) good
(-) bad
(*) neutral

Path 1: Build a loader from scratch in D, using binutil as a guide.

    - Longest possible development path
    - Minimal cross-platform implemention must load OMF and ELF.
    - Need to build an OMF loader from scratch, using COFF loader as guide
    + Frees GDC compiles from the tyranny of legacy (C) libs
    + Frees DMD compiles from the tyranny of legacy (C) libs
    + Guaranteed GC interaction with runtime loaded components.


Path 2: Back binutils, and augment with an OMF loader. Distribute as binary with D headers for DMD and an OMF loader C patch for binutils on *nix.

    - Need to build an OMF loader from scratch in C
    - Need to integrate custom OMF loader into binutils, and possibly submit as a patch to the binutils project.
    - Need to write D headers/wrapper for BFD under D.
    - System-specific deployment - could become a headache if integrated into standard D distribution or Ares.
    - No guaranteed interaction between BFD and the D garbage collector.
    + shortest possible development path
    * Project adopts GPL license by default, due to backing binutils


The reason why the GC is important is so that libraries can be dropped from memory when they fall into disuse, just like all other resources.

As for the need for both linux and windows to have the same binary loading capability? If the OMF or ELF object in question is free from any OS-dependent code or libraries, then the binary is automatically 100? platform independent within the same processor architecture. This means that Windows, Linux, BSD and OSX (once Apple switches to Intel) can all use the same objects on i386. I don't think I need to spell out why this is a good thing. Smile
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Aug 11, 2005 8:00 pm    Post subject: Reply with quote

Option 3:

Assume Path1, but begin with ELF support only. Install Cygwin on Win32 to get ELF object files from the D compiler (is that what the Unix version of DMD produces?). There are notable limitations here; for example, linking to Win32 DLLs would likely be a big problem.
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Fri Aug 12, 2005 5:33 am    Post subject: Reply with quote

kris wrote:
Option 3:

Assume Path1, but begin with ELF support only. Install Cygwin on Win32 to get ELF object files from the D compiler (is that what the Unix version of DMD produces?). There are notable limitations here; for example, linking to Win32 DLLs would likely be a big problem.


GCC produces ELF objects, which have to be 'ld'ed to become 'relocatable' (like a COFF or OMF file).

Working with both compiler environments would bring the cross-platform nature of this thing to the foreground. I should've installed Cygwin a long time ago, so I'll probably do that and throw in MinGW for good measure. There's no excuse for any of this to not cross-compile, so why do things half-assed?

Thanks for the feedback Kris, this is likely what I'll do.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Sat Aug 13, 2005 7:33 pm    Post subject: I couldn't resist Reply with quote

First off, Kris, I didn't answer your question. ELF is in fact the binary of choice for linux and is what gcc produces.

ELF is really nifty as it is one file format that covers the same territory under *nix systems as .sys .com .exe .dll and .obj do under windows. The real kicker is that it's a decade old already. Before that, everyone was using COFF.

Against Kris' suggestion, I spent the day hacking away at the OMF spec instead. I now have a parser that can digest Digitalmars' rendition of Microsoft OMF files. It correctly identifies the segment descriptions, name tables, public symbols, and externs.

What it can't do is all the important stuff: coalesce data chunks into segments, align segment data, apply fixups, and resolve symbols to actual memory locations within segment data. I figure that if I can get this loader to do that much, it'll be on par with an in-memory image of an ELF file. A common interface for both types should be straightforward to hammer out once I hit that point.

Since OMF is so jam-packed with legacy cruft (the format dates back to the 1970's), that figuring out what support to drop has been tough. I've developed iteratively against the program's own .obj files, and so far so good. DMD emits very boring and straightforward .obj files that are digestable while using only about 30? of the OMF specification. I've also identified some record types that aren't even needed for our purposes.

There were some gems I unearthed in the spec. The biggest, was that .lib files follow a variant of the same format, so it may be possible to package things as .lib files as well as just .obj files. Also, there is an explicit end record for OMF .obj files, so there's a no-mans land between this record and EOF that could be used to attach additional info (at the very least, there's plenty of elbow room in the comment records).

The ugly part? Not that it matters for DDL, but the EXPDEF comment record is divergent from both the Intel and Microsoft versions of the spec. I found this quite odd, and kept my code commented for all to see.

Arrow Anyway, what I have now would probably help anyone who just wants to see what the heck is in any of their .obj files. It's all in the SVN repos:

http://svn.dsource.org/projects/dsp/trunk/ddl/

FYI: The following works:
Code:
omfloader omfloader.obj

_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Sat Aug 13, 2005 8:08 pm    Post subject: Reply with quote

(2nd post in a row for tonight)

Question There's one lingering problem that I'm starting to see now. Presently, there is no way to dig symbol names out of the host .exe. These would be needed to resolve any externs on a loaded .obj file.

What is needed is a "void* getSymbolAddress(char[] name)" for the current app. All it would have to be is a simple lookup table, but unfortunately, the linker throws all that out in the name of efficency.

As I see it, there are three possible ways to fix this without causing a change do DMD or the D language itself.

    - One, mandate use of Ben's dfend; with some tweaks, it would work nicely.
    - Two, write a loader that consumes the .map file generated from a build, so the main program can know its internals. An *old* hack I read in the DNG did just this well before dfend came along.
    - Three, write DDL capable applications just like java, and reduce the main executable to a mere bootstrap. Phobos.lib would be loaded first, followed by the 'main' .obj. It would be 100? dynamic from there on.


All of these also require the ability to load phobos.lib just like the other .obj files, should a symbol not exist within the main exe. That's at least something that can be handled in the scope of this project, one way or another. The actual in-exe lookup problem looks fixable too, I'm just not sure which avenue to take.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Aug 15, 2005 10:08 am    Post subject: Reply with quote

pragma wrote:

- Three, write DDL capable applications just like java, and reduce the main executable to a mere bootstrap. Phobos.lib would be loaded first, followed by the 'main' .obj. It would be 100? dynamic from there on.

My 2 cents:

At face value, it would appear the related issues would basically evaporate if you went down the path of #3. Would be simpler for the developer too, and Build could be tweaked to produce the bootstrap executable (on the fly).

I suspect providing a trivial framework (somewhat akin to WinMain) would exhibit beneficial traits in all of robustness, simplicity, concept, comprehension, maintenance, etc. The approach would also lend itself to providing something similar to the ClassLoader concept ~ very powerful for certain application types.
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Aug 15, 2005 10:47 am    Post subject: Reply with quote

kris wrote:

My 2 cents:

At face value, it would appear the related issues would basically evaporate if you went down the path of #3. Would be simpler for the developer too, and Build could be tweaked to produce the bootstrap executable (on the fly).

I suspect providing a trivial framework (somewhat akin to WinMain) would exhibit beneficial traits in all of robustness, simplicity, concept, comprehension, maintenance, etc. The approach would also lend itself to providing something similar to the ClassLoader concept ~ very powerful for certain application types.


Exactly, although it'll probably get its start as more of an "object loader" rather than a java-style classloader. It'll have to resolve object file dependencies on its own (would be hard to use otherwise).

Some concerns, further down the line, regarding the techology are all performance based. Symbol lookups, object path management and package management aren't impossible to solve, but they can cost developer and program time if done poorly. But as you note, the advantages are huge, so damn the torpedoes.

BTW, if there are any "lessons learned from java" that you can point me too, that would be fantastic. I'm sure there's some rants out there that might help decide what is the correct path from here.

The next set of hurdles beyond that is to manage versioning somehow. I can see this being done either in module metadata (strong-naming like .NET assemblies) or just by including a version number into the symbol namespace directly (eg: mylibV10.Foobar... aliases in D code can make this manageable). The idea is to avoid situations like "DLL hell" by encouraging side-by-side installations of different versions of code, based on the understanding that old code should run against old dependencies.

Way, way down the line, we can fold this into whatever introspection/reflection that D offers, as again we're opening up the underbelly of the platform by doing this. Perhaps runtime generated proxies and code generation isn't that unrealistic anymore?
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Aug 15, 2005 2:15 pm    Post subject: Reply with quote

Status

Yesterday saw the incremental improvement of what is essentially an OMF viewer. I keep coming up a byte short when handling fixup records, so the parser needs some improvement there. Luckily, I managed to scrounge up some code samples online (three to be exact) that may have some clues as to how to properly handle this record type.

One thing that concerned me was how code and data segments might come into play when interfacing with running code. I did some experiments, and examined some .map file output to come to the following conclusions:


    - D uses a flat memory model
    - The Code and Data segments overlap, creating a huge contiguous read-write-execute space
    - Programs use the standard fixup address of 0x0400000 for the start of the code/data group.
    - One can call data and read/write code.


I was a bit concerned that I'd have to use VirutalProtect() in order to crack open any segment-level protections in place. It looks like the way ahead is paved to the horizon for this kind of work. Its a very good thing (although a bit crummy for security reasons).

concepts

I think it's very possible to write a runtime code emit library that could work hand-in-hand with reflection. The environment fostered by D is certainly very friendly to this concept.

I wrote about this earlier, but I would really like to see run-time generated proxies for RMI/CORBA style programming. It would be a matter of creating a new object that stems from the source objects' typeinfo, and has generated stub methods that handle all the marshalling to the remote host. Again, understanding the types passed to the stub methods would also use reflection to determine how to perform the marshalling.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Tue Aug 16, 2005 11:30 am    Post subject: Reply with quote

Status

Last night, I finally worked the kinks out of the prototype OMF parser. It now completely digests .obj files and can make some sense of their contents. It looks like now I get to make succesive passes over the code and specification to converge on the next task: building an in-memory image with correctly assigned fixups.

After diving deeper into the .lib format addendum to the OMF spec, it looks like .libs shouldn't be relatively easy to implement on top of a functional OMF loader. Basically a lib is just a pile of OMF obj files stored back-to-back, aligned to 512 byte boundaries, plus a hashed dictionary at the end.

This means that our potential object loader will yield N modules/namespaces per load, rather than just one (one per object file). This will work nicely since the ELF format will also allow for multiple namespaces to be bundled into a single binary.

Concepts

It is apparent to me that there is another problem with respect to searching for a symbol in a set of object files strewn across a search path. Its not a feature that's explicitly needed, but its a problem that can be served well at the binary level.

For instance, we really don't have to completely load an object-file if we can just tell it to abort the load if it looks like its the wrong module. This can happen during the inial parse, well ahead of the (computationally expensive) fixup stage of an object load. One could easily start the OMFLoader with a namespace/module requirement that will cause the operation to throw should it confirm that the object does indeed belong to the wrong namespace.

Its not the best solution, and frankly, smells like a hack. But it got me thinking that maybe the problem area is bigger than just 'do I have symbols for module x'.

Another route would be to 'bless' an object-file with a wrapper of our own design. This would require an additional tool to be run against objects that will turn them into offical DDL files. I'm leaning twoard this approach only because it leaves the door open for a lot of neat stuff:

    - Module name(s) supported (see above) for enhancing searches.
    - Module unique identifier support: helps with versioning and searching.
    - Anything else your mind fancies: using an XML blob* for the header format would allow for a nearly infinite amount of variety for DDL metadata. Alternatively, a "name=value" list could also accomodate plenty of data.


Any other way to provide the above would have to be done explicitly using D code for a given module. The drawback to that approach is that you need the module completely loaded, fixed-up and linked into the runtime before you can extract that data; that's a lot of mallocs** and milliseconds into the load process.

However, putting this data into the code, so it may be consumed by the 'bless' tool could give us the best of both worlds. There's a lot of different ways this can go.


(*- Don't worry, I'm actually leaning more toward a D-array style binary format. XML is grand, but you really need a sizable library to get the job done right. Plus its too generalized for targeted applications, which leads into why it can be much slower than a good binary format.)

(** - Yes I know D doesn't use malloc... explicitly, but it sounds good. If you can come up with a better "lots of time and space" expression, I'm all ears. Smile)
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Tue Aug 16, 2005 12:40 pm    Post subject: Reply with quote

Sounds good, Eric.

I like the 'blessing' pass too, which could optionally be done by Build also. However, rather than digging through a sea of obj and lib files to locate a symbol, can't this somehow be limited to a known minimal set? I mean, can the D 'import' statements somehow be utilized to identify all required external links, and the 'blessing' tool perhaps mark lib/obj files appropriately? Could this even work for C obj/libs?

A variation on this theme would be to treat the 'blessing' as an optimization only ~ if the meta-data is missing from the obj/lib, revert to a less optimal strategy. This would, for example, take care of (D linking to) C linking into other C libraries; yes?

- Kris
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Tue Aug 16, 2005 5:04 pm    Post subject: Reply with quote

kris wrote:
I like the 'blessing' pass too, which could optionally be done by Build also. However, rather than digging through a sea of obj and lib files to locate a symbol, can't this somehow be limited to a known minimal set? I mean, can the D 'import' statements somehow be utilized to identify all required external links, and the 'blessing' tool perhaps mark lib/obj files appropriately?


I see what you mean. Even with sifting through all the headers to find the right namespace for further linking, it still creates a pretty deep search.

Perhaps this is the consequence for using a loose file system approach. A container, like a zip/jar style format could contain a dictionary that correlates namespaces to binaries; that could speed things up.

Also, a smart loader could also perform some bookeeping on the files within its object-path, as to optimize searches.

But as to identifying external links, *before* a binary is loaded, that could go in the header too. Yea, that would really optimize things.

Quote:
Could this even work for C obj/libs?


The only problem I have with legacy libs is that it adds what support is needed in the OMF loader. It's kind of a big unknown if the output is going to compare to what DMD generates.

So far, I haven't done any testing with IMPLIB'ed binaries or *old* OMF formatted bits. I don't even know if running DMC against old .c source will generate tricky binaries. We'll have to wait and see until I can work that into my unittest for the parser.

Quote:

A variation on this theme would be to treat the 'blessing' as an optimization only ~ if the meta-data is missing from the obj/lib, revert to a less optimal strategy. This would, for example, take care of (D linking to) C linking into other C libraries; yes?


Yes, exactly. I like the idea of layering these 'protocols' so that they can fail back to a minimal working set. That minimal set is where I am right now and where Beta 1 is going to go. That way, everything on top of that is an optimization for specific situations and strategies.

(Thanks for the feedback Kris, and happy 800th post!)
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Wed Aug 17, 2005 9:06 am    Post subject: Reply with quote

Nomenclature

Perhaps its a bit pedantic, but I want to nail down what I'm calling everything so that my posts and documentation make some sense

    Module - A D namespace that corresponds to a source file, via the 'module' statement (eg. "std.stdio"). When a module is compiled, the result is called a Binary File.
    Binary File - A compiler product that contains compiled code, data, fixups, dependencies and symbolic information. A Binary file is considered to correspond to exactly one Module.
    Symbol - A part of a Binary File's metadata. It corresponds to a named entity in code such as a function, class, struct or variable. A symbol is understood to exist within exactly one Module.
    Dependency - A part of a Binary File's metadata. It References a symbol that a given Module needs in order to run. These are analagous to the import statment in D, but actually cover many additional symbol dependencies as determined by the compiler.
    Fixup - A part of a Binary File's metadata. It is a reference to a place within a the runtime representation of a Module (via loading a Binary File), that needs to be modified to reflect the module's base address in memory.
    Base Address - When a Binary File is loaded, its code and data are placed into a flat memory space beginning at the base address; this location is determined at runtime. It is used to apply Fixup information as a part of the Runtime Linking process.
    Library File - A file that contains multiple Binary Files embedded within it. As a result, a Library File corresponds to one or more Modules.
    D Dynamic Library (DDL) - A file that is a Library File, that contains additional information to expedite Module name resolution, version resolution, and symbol name resolution. It may also contain additional metadata for custom applications.
    Runtime Linking - The process by which a Binary File is loaded and processed to rebase all the fixups in the module, and resolve all the module's dependencies.
    Runtime Module - The end-result of the Linking process, defined by the module's code, data stored at the runtime Base Address for that module. It also encompasses the symbol, dependency and fixup metadata, again all appropriate for their runtime locations.


Notes:

D enjoys a flat, 32-bit memory nodel with overlapping segments (at least under windows anyway). Because of this, there is only ever really one Base Address for the whole Runtime Module.

DMD creates OMF formatted Binary Files (.obj) that must be compiled into a library (.lib) before they can be considered a Library File. This differs from ELF, which the library and binary format are one-and-the-same. Whenever the notion of a Binary File is used here, the reader should assume that this would apply only to an ELF of exactly *one* module.

Since one can create a Binary File via any compiled langauge (not just D, like C or C++), the definition of a Binary File or library is by no means bound to strictly just D. Also tools like IMPLIB, which provide wrapping support for DLL files, also create valid Binary Files.

Status

Worked around *two* huge mistakes in the OMF documentation. A word to the wise: The illustrations do not always match the text nearby. Please use both hemispheres of your brain when reading this document.

I finally saw correct fixup and symbol information from the parser for the first time last night. So far, so good.
_________________
-- !Eric.t.Anderton at gmail


Last edited by pragma on Wed Aug 17, 2005 2:25 pm; edited 1 time in total
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Wed Aug 17, 2005 11:24 am    Post subject: Reply with quote

For the sake of the pedantic, perhaps it's worth identifying where non-D binaries fit into the picture (such as C libs/objs) ?

Also, you discuss OMF which is a Win32 only format? You've made it clear that ELF can handle all requirements too, but is there an intent to support ELF on linux down the road?

Cheers;
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Wed Aug 17, 2005 2:22 pm    Post subject: Reply with quote

kris wrote:
For the sake of the pedantic, perhaps it's worth identifying where non-D binaries fit into the picture (such as C libs/objs) ?


Good catch. I'll loosen the language a bit and be more inclusive than just D in terms of binaries and libraries.

Quote:

Also, you discuss OMF which is a Win32 only format? You've made it clear that ELF can handle all requirements too, but is there an intent to support ELF on linux down the road?


Oh, heck yea. Wink

By providing a straight-up D implementation of an ELF loader, the result should be cross-platform ELF support. Not that Linux needed the leg up, but this way there's no question as to how the memory management is going to go. Plus we get to dump a lot of legacy baggage on the Linux side of the house.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries All times are GMT - 6 Hours
Goto page 1, 2, 3 ... 10, 11, 12  Next
Page 1 of 12

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group