FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Call for Discussion: dbi.Row
Goto page 1, 2  Next
 
Post new topic   Reply to topic     Forum Index -> DDBI
View previous topic :: View next topic  
Author Message
jpelcis



Joined: 09 Mar 2006
Posts: 40

PostPosted: Sat Jul 01, 2006 3:15 pm    Post subject: Call for Discussion: dbi.Row Reply with quote

The quote below comes directly from the SQL:2003 Specification, 5th Working Draft. To see what led to this, go to http://www.dsource.org/forums/viewtopic.php?t=1637.

Chapter 2, Page 11 wrote:

SQL defines predefined data types named by the following <key word>s: CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT, BINARY LARGE OBJECT, NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME, TIMESTAMP, and INTERVAL.


There's also XML and other types in previous specifications, but these are the critical ones. Most of these can be converted to either a D basic type or something in the standard library.

Code:

CHARACTER
         dchar
CHARACTER VARYING
         char[]
CHARACTER LARGE OBJECT
         char[]
BINARY LARGE OBJECT
         void[]
NUMERIC
         char[] (no equivalent)
DECIMAL
         char[] (no equivalent)
SMALLINT
         short
INTEGER
         int
BIGINT
         long
FLOAT
         float
REAL
         double
DOUBLE PRECISION
         double
BOOLEAN
         enum {false, true, unknown} (bool doesn't have unknown)
DATE
         std.date.Date
TIME
         std.date.Date
TIMESTAMP
         std.date.Date
INTERVAL
         std.date.Date


Luckily, many of these end up as the same D type. Row can be changed to:

Code:

enum Boolean {
   False = false,
   True = true,
   Unknown
}

final class Row {
   this (char[][] names, char[][] stringValues, ...);

   alias getString opIndex; // Note: size_t instead of int.
   deprecated alias getString get; // Close enough.

   deprecated int getFieldType (int index); // Need a standardized type.
   deprecated char[] getFieldDecl (int index); // Most DBDs can't get this info.

   deprecated void addField (char[] name, char[] value, char[] decl, int type); // See above two.

   void addField (char[] name, char[] stringValue, char[] value, TypeInfo type = typeid(char[]));

   char[] getFieldName (size_t index);
   size_t getFieldIndex (char[] name); // Note slight change.

   TypeInfo getFieldType (size_t index);
   TypeInfo getFieldType (char[] name);

   dchar getChar (size_t index);
   dchar getChar (char[] name);

   char[] getString (size_t index);
   char[] getString (char[] name);

   void[] getObject (size_t index);
   void[] getObject (char[] name);

   short getShort (size_t index);
   short getShort (char[] name);

   int getInt (size_t index);
   int getInt (char[] name);

   long getLong (size_t index);
   long getLong (char[] name);

   float getFloat (size_t index);
   float getFloat (char[] name);

   double getDouble (size_t index);
   double getDouble (char[] name);

   Boolean getBoolean (size_t index);
   Boolean getBoolean (char[] name);

   std.date.Date getDate (size_t index);
   std.date.Date getDate (char[] name);

   private:
   char[][] fieldNames;
   char[][] stringValues;
   TypeInfo[] types;
   long longValue;
   double doubleValue; // Not real for consistency.
}


There are several things that could be done to improve this, but I wanted to get someone else’s opinion first.

My primary reservation is how big Row would be. One possible solution is to get rid of the getChar, getShort, getInt, and getFloat functions. The values could be cast up to the bigger type and remain that way for the function return.

Another is the use of TypeInfo, which could be solved in the case of addField by the use of a template, but what about getType?

Also, it is debatable whether the conversion should be done in Row or in Result. My opinion is to do it in Row to whatever is specified when the value is added. The problem is what to do with, for example, PostgreSQL, which doesn’t seem to have a method to get the type of something. Should the use of a string be forced, should the user be allowed to attempt to get the value in any format (DBIException or ConvException thrown on failure), or should dbi.pg.PgResult attempt to decipher the type of the string?

Anyway, I just thought I would throw this out before I started coding. Whatever is decided on here will be in 0.3.0, with anything deprecated removed in 0.4.0.


Last edited by jpelcis on Thu Jul 13, 2006 10:11 am; edited 3 times in total
Back to top
View user's profile Send private message
brad
Site Admin


Joined: 22 Feb 2004
Posts: 490
Location: Atlanta, GA USA

PostPosted: Sat Jul 01, 2006 3:53 pm    Post subject: Reply with quote

This plan looks fine to me. I just wanted feedback on doing more than just get() as was the case in 0.2.0.

One question, are you thinking ahead on platforms available to D with size_t?

BA
Back to top
View user's profile Send private message
jpelcis



Joined: 09 Mar 2006
Posts: 40

PostPosted: Sat Jul 01, 2006 4:07 pm    Post subject: Reply with quote

brad wrote:

This plan looks fine to me. I just wanted feedback on doing more than just get() as was the case in 0.2.0.

Cool. That's good to hear.

brad wrote:

One question, are you thinking ahead on platforms available to D with size_t?

That's mostly it, although I believe it can also be more efficient to use size_t on x86 platforms since it will match the cpu calculations and registers, although I might be wrong.
Back to top
View user's profile Send private message
brad
Site Admin


Joined: 22 Feb 2004
Posts: 490
Location: Atlanta, GA USA

PostPosted: Sat Jul 01, 2006 4:38 pm    Post subject: Reply with quote

Sounds good.

I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date.

Along those lines, I was looking at getFieldNames()

http://www.dsource.org/projects/ddbi/browser/trunk/dbi/row.d#L83

and in talking to larsivi, we decided to leave it as is. I was considering keeping a separate associative array with fieldnames as keys and the field indices in the other array (fieldValues, I think) as values. We would incur more storage, but maybe have faster index lookups than this method. But, because it's only a problem on huge rows, and given the youth of this lib, he cautioned me to not prematurely optimize...

BA
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Wed Jul 12, 2006 10:58 pm    Post subject: Reply with quote

brad wrote:
I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date.

Hrm ~ I'd be careful about that, since removing functionality is considerably harder than adding it (deprecation can be tough for some users).

The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices. The getFloat() should perhaps be combined with getDouble(), if only because nobody really uses float anymore? Maybe not.

The getChar() is a bit different, since it may be advantageous to have it return dchar instead? Utf support can become a tad messy, and array types are often treated quite differently that single instances. To illustrate, while it might be perfectly acceptable to support only utf8 strings, you can't stuff a single utf8 sequence into a char ... has to be dchar instead (dependent upon what the underlying DB actually exposes).

Thus, you might have:
Code:

dchar getChar();
char[] getString();

It's a shame these D types were not called utf8, utf16 and utf32 instead of char, wchar and dchar Wink

On the other hand, if you were to support returning arrays in general, then type-folding becomes a royal pain for the user. But since you're using the Row approach, this latter aspect really shouldn't become an issue.
Back to top
View user's profile Send private message
jpelcis



Joined: 09 Mar 2006
Posts: 40

PostPosted: Wed Jul 12, 2006 11:20 pm    Post subject: Reply with quote

kris wrote:
brad wrote:
I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date.

Hrm ~ I'd be careful about that, since removing functionality is considerably harder than adding it (deprecation can be tough for some users).


True. That's a large part of why I am announcing this so far ahead of time (the other, of course, is for opinions).

kris wrote:

The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices.


Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now.

I'm not sure exactly what you mean by bound together, but my guess is that you are saying there should only be a getInt(), with a getShort defined as cast(short)getInt(). Is that correct?

kris wrote:
The getFloat() should perhaps be combined with getDouble(), if only because nobody really uses float anymore? Maybe not.


Float is still used for some purposes (in fact, video cards often use "half" instead of "double"). Float support could be done as described above for the integers.

kris wrote:

The getChar() is a bit different, since it may be advantageous to have it return dchar instead? Utf support can become a tad messy, and array types are often treated quite differently that single instances. To illustrate, while it might be perfectly acceptable to support only utf8 strings, you can't stuff a single utf8 sequence into a char ... has to be dchar instead.


Good catch. I've changed the proposal to return dchar instead of char. Would it be worth it to return dchar[] while we're at it, too (assuming we get implicit casting between character types)?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Jul 13, 2006 12:50 am    Post subject: Reply with quote

I was suggesting getShort() and getFloat() be removed Smile

Regarding dchar[], that's a tricky one. It's very much dependent on the application using the data. For example, a word-processor might use utf32[], whereas the actual text-names managed by a file-system might use utf8[] instead.

There's a considerable (relative) cost related to utf conversions. In many scenarios it just doesn't matter. For a DB front-end, it perhaps does? If you can support all three on every DB (no conversion required) then that may be an option. Otherwise, I'd be very tempted to stick with utf8[] only.

Why? It's difficult to find a yardstick everyone will agree with, but experience with Mango has indicated the "lower" the level of functionality, the more likely it is to use utf8[]. Another way to express that is, utf8[] is generic in nature, whereas utf32[] tends to be quite application-specific.

Poor old utf16[] is in a bit of a no-mans land: in comparison to the other two, it offers something very useful for a truly small segment of applications. That is, utf16 doesn't offer the attractive direct-indexing ability that utf32 has, and typically takes more space than utf8. For BMP-chars-only utf16 can be a good tradeoff for specific devices, such as "credit-card" language translators.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Jul 13, 2006 1:03 am    Post subject: Reply with quote

jpelcis wrote:
Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now.


That sucks Confused

One solution might be to extract the DB schema out into some form that you can utilize at runtime? This would provide the missing meta data, but seems like a lot of effort.

String-only is cool for many applications, but others will pay a major performance price when indexing, for example.

Another solution might be to make part of the DB interface pluggable. Postgres gets string only, whereas others get full support? It's not ideal, but better than limiting to string only? A pluggable approach might be used to handle the meta-data noted above?
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Thu Jul 13, 2006 9:55 am    Post subject: Reply with quote

jpelcis wrote:
kris wrote:

The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices.


Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now.


As kris said: that sucks. Sad

Usually when you're talking to a databse, the schema is known. The typical exception to this is when you're working with a database suite like TOAD, in which case, it understands that it needs to dig deep in order to get metadata. IMO, one could write a rather nice extension (a layer on top rather than as a plugin, as Kris mentioned) to DBBI that handles vendor-specific metadata for other tasks... but that's clearly beyond the present scope of DBBI as far as I know.

So why not err on the side of trusting the programmer, just as D itself does? Just allow the various row accessors (getInt, getString, etc.) to convert directly from binary data present in the recordset. I gather that there may be some promotion or other *small* conversion steps needed for some types (vendor-specific floating-point types, zero-terminated strings, etc), but mostly, it could be just static casts from binary data.

This can also help the string debacle: IMO UTF transcoding shouldn't be the responsibility of a database access layer. Just allow getString()/getUtf8(), getUtf16(), and get getUtf32() to cast the raw data over to the type at hand and return it to the user. To cover all the other cases, you could even provide a getRaw() that returns a void[] - just to play it safe.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
jpelcis



Joined: 09 Mar 2006
Posts: 40

PostPosted: Thu Jul 13, 2006 9:58 am    Post subject: Reply with quote

kris wrote:
I was suggesting getShort() and getFloat() be removed.


With that simple function and an explicit cast, it could make the user code have less warnings. If the function body for each is one line, there's not really much of a reason NOT to throw it in.

kris wrote:

There's a considerable (relative) cost related to utf conversions. In many scenarios it just doesn't matter. For a DB front-end, it perhaps does? If you can support all three on every DB (no conversion required) then that may be an option. Otherwise, I'd be very tempted to stick with utf8[] only.


And that's where the fun comes. With some DBs, you can have your strings stored as UTF-16. If you do that, you'll do the conversion to avoid the conversion Laughing. That said, there could be a version to get utf-8, utf-16, and utf-32. Whether a conversion is required or not would depend on what the server returns.

kris wrote:
One solution might be to extract the DB schema out into some form that you can utilize at runtime? This would provide the missing meta data, but seems like a lot of effort.


Shocked I'm not sure its worth the time to do that. That said, if someone else does it, I'll put it in.

kris wrote:
String-only is cool for many applications, but others will pay a major performance price when indexing, for example.

Another solution might be to make part of the DB interface pluggable. Postgres gets string only, whereas others get full support? It's not ideal, but better than limiting to string only? A pluggable approach might be used to handle the meta-data noted above?


I meant do string only for PostgreSQL, since that's the only thing the application can know works for sure. Personally, I'm still favoring allowing the application to request, for example, an int, but have it throw an exception if that doesn't work.

Or we could deprecate the entire PostgreSQL library and have its connections go through the ODBC library, which somehow supposedly gets the type information Twisted Evil. Or not.
Back to top
View user's profile Send private message
jpelcis



Joined: 09 Mar 2006
Posts: 40

PostPosted: Thu Jul 13, 2006 10:11 am    Post subject: Reply with quote

pragma wrote:
As kris said: that sucks. Sad


Sorry about that. My clarity failed me for a moment. See my other post for what I meant.

pragma wrote:
Usually when you're talking to a database, the schema is known. The typical exception to this is when you're working with a database suite like TOAD, in which case, it understands that it needs to dig deep in order to get metadata. IMO, one could write a rather nice extension (a layer on top rather than as a plugin, as Kris mentioned) to DDBI that handles vendor-specific metadata for other tasks... but that's clearly beyond the present scope of DBBI as far as I know.


Most databases will happily tell you what type a piece of data is. That's the whole concept behind this proposal. The problem is what to do with databases that don't say what something is.

pragma wrote:
So why not err on the side of trusting the programmer, just as D itself does? Just allow the various row accessors (getInt, getString, etc.) to convert directly from binary data present in the recordset. I gather that there may be some promotion or other *small* conversion steps needed for some types (vendor-specific floating-point types, zero-terminated strings, etc), but mostly, it could be just static casts from binary data.


The problem is that not only do they not return the type, but they return the data in string format (at least PostgreSQL does that). If it's simply cast to the desired type, the result would be pure garbage.

pragma wrote:
This can also help the string debacle: IMO UTF transcoding shouldn't be the responsibility of a database access layer. Just allow getString()/getUtf8(), getUtf16(), and get getUtf32() to cast the raw data over to the type at hand and return it to the user. To cover all the other cases, you could even provide a getRaw() that returns a void[] - just to play it safe.


See above for why I think that form of getUTF#() wouldn't work. It the pure binary format was returned that would be perfect though.

As for the getRaw(), that's essentially what I was looking at with getObject, but I think that void[] is a better type for it. My proposal has been changed to represent that.


Last edited by jpelcis on Thu Jul 13, 2006 10:35 am; edited 1 time in total
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Jul 13, 2006 10:30 am    Post subject: Reply with quote

If Postgres is the oddball, I'd simply leave it out Twisted Evil

You mentioned the ODBC driver may be able to provide the missing metadata? That would work too Very Happy
Back to top
View user's profile Send private message
larsivi
Site Admin


Joined: 27 Mar 2004
Posts: 453
Location: Trondheim, Norway

PostPosted: Thu Jul 13, 2006 4:18 pm    Post subject: Reply with quote

kris wrote:
If Postgres is the oddball, I'd simply leave it out Twisted Evil


Too bad it might be the best free SQL DB out there Wink
Back to top
View user's profile Send private message
Kashia



Joined: 16 Feb 2006
Posts: 6

PostPosted: Sun Jul 16, 2006 2:28 pm    Post subject: Reply with quote

Bah! I will cry if you leave PostgreSQL out Wink You just don't know how to handle the power Razz

To get the table oid (needed for getting other info):

Code:

SELECT c.oid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind='r' AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND pg_catalog.pg_table_is_visible(c.oid) AND c.relname='<<<<<table name>>>>>>'


Returns the colum names and types:

Code:

SELECT a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod) FROM pg_catalog.pg_attribute a WHERE a.attrelid = '<<<<table oid>>>>' AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum


You may want to combine those two, or not. Flexibility is everything Smile

Kashia Mr. Green
Back to top
View user's profile Send private message
jeremy_c



Joined: 09 Apr 2005
Posts: 16
Location: Ohio, USA

PostPosted: Wed Nov 08, 2006 4:08 pm    Post subject: Reply with quote

Hello! I'll post a "I'm Back" message somewhere else, but, well, "I'm Back!"

Anyway, going to use DDBI again I really like what has been done! Good work on everyone's part. I had an idea, maybe a silly one, but I'll just say it and you guys can hash it over.

What if we made the query functions accept an optional argument of what Row class to use, defaulting to a simple Row class constructor? i.e. DDBI can have a simple Row class giving the basic elements, trusting the programmer for whatever. But in other synarios where you want more power, you could inherit from an abstract Row class and do cool things like this:

Code:

class PersonRow : BaseRow {
  char[] getFullName() { return data["fname"] ~ ' ' ~ data["lname"]; }
  AddressRow getAddress() { return queryForOne("SELECT ...", &AddressRow.Create); }
  bool update() { db.execute("..."); }
}

PersonRow CreatePersonRow() { return new PersonRow() }

PersonRow[] rows = db.queryFetchAll("SELECT * FROM person", &CreatePersonRow);


DDBI could even provide some advanced functionality rows if you would like to use them, like UpdatableRow which can track changes and issue UPDATE statements itself when attributes are changed, etc...

Oh, then the people who need advanced type casting, data handling, etc... they could inherit from say BaseRow or any *Row class and go to town with all their own requirements. You could even override a function like:

Code:

class BaseRow {
  void set(char[] fieldName, char[] data) {
    switch(fieldName) { case "id": id = atoi(data); break; ....}
  }
}


Not sure how effective that would be but we trade performance for programmer convience all the time and do it successfully, look at Ruby on Rails! Very Happy

Oh, one more note... I agree w/Kashia... Do not throw out PostgreSQL, please!!!!!!!!

Jeremy
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DDBI All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group