FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Statement...the pain.
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic     Forum Index -> DDBI
View previous topic :: View next topic  
Author Message
dhasenan



Joined: 03 Feb 2005
Posts: 73
Location: New York

PostPosted: Wed Jan 28, 2009 6:55 pm    Post subject: Reply with quote

Quote:

After giving this more thought, I agree with dhasenan (if I understand him correctly) and do not think the merge helps. I agree that the current version in trunk probably over allocates a bit, but a small test I did suggests that it doesn't really make a dent in 'top'. I believe (for the non-statement case) that the majority of the allocation happens in the dbms library in any case.


You understood me correctly. Optimizing a molehill beside a mountain :>

Quote:

I can see that dhasenan also seems to be coming over usability issues with both current approaches.


ddbi has a strange way of binding parameters, and a strange way of getting results.

It isn't readily apparent that I can cache a Statement and just reset it once I'm finished with the results, and it's not clear how to get a MultiResult from a Statement. (For the former, if there were a separate Result object, it would be more clear. That would probably involve more hidden costs, but so be it.

But adbi is a fair bit stranger. I'm happy that it has shortcuts for CRUD, but that's about it.

Looking through the code, for instance, it's doing string catenation on MysqlDatabase.setXXX methods for binding parameters to the database's current query. Eugh. If you're worried about efficiency, that allocates, and it sends more stuff over IPC. Plus it's dependent on consistent formatting that doesn't need escaping in mysql (which is a reasonable assumption, but still).

In terms of code quality, I think that adbi has fewer DRY violations. But that can be cleaned up without too much difficulty with a small codebase like ddbi.

Quote:

And to end that Razz : I've been very busy for the last couple of months, and although the situation is improving somewhat, I cannot throw in many hours per week just yet, so please show a bit patience Smile One way to ease the process for me, would be to create actual patches against DDBI svn.


I think I shall do that. I've been adopting a Git-R-Done attitude with this project, so I've been hacking at ddbi a bit, but I did the unthinkable! I imported the project in descent and reformatted it. I am sob.
Back to top
View user's profile Send private message AIM Address
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Wed Jan 28, 2009 10:41 pm    Post subject: Reply with quote

Quote:

Looking through the code, for instance, it's doing string catenation on MysqlDatabase.setXXX methods for binding parameters to the database's current query. Eugh. If you're worried about efficiency, that allocates, and it sends more stuff over IPC. Plus it's dependent on consistent formatting that doesn't need escaping in mysql (which is a reasonable assumption, but still).


No it doesn't! I appreciate the comments but please try to read the code carefully. Where is it doing string concatenation? You'll see that all of the "writer_ ~=" statements are just calling an opCatAssign which copies the data into the buffer - there should be no reallocation - unless the array needs to grow. Those Integer.format methods which are passing the string to opCatAssign are also using memory that's allocated on the stack so there should be no allocation. I can't say I've taken care of array concatentation in every instance, but I've done my best and it's a work in progress...


Last edited by aaronc542 on Wed Jan 28, 2009 10:50 pm; edited 1 time in total
Back to top
View user's profile Send private message
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Wed Jan 28, 2009 10:49 pm    Post subject: Reply with quote

Quote:

If you had Tango-style variadic functions (that is, a variadic function, as well as a function that takes _arguments and _argptr), I could work with that quite well. While it's hard to support both variadics and your template approach at the same time, there is no real benefit to the template approach; all you really need to do is change a couple loops and change getBindType's `static if (is (T == sometype))` to `if (type == typeid(sometype))`.


This is a good point. I had been thinking about reimplementing this with Tango-style variadics but for some reason didn't... I guess I was wondering what the feedback was on codegen overhead - but you're probably right - it's not work it. Probably just better to use _arguments and _argptr.

With regards to setting a single parameter at a time in Statement, I had been planning on adding a sendLongData, getLongData API for doing streaming to a statement. Would that help?

Ok, with regards to prepared statements and multi-result sets, can you please mention a database that supports this feature?
Back to top
View user's profile Send private message
dhasenan



Joined: 03 Feb 2005
Posts: 73
Location: New York

PostPosted: Thu Jan 29, 2009 8:18 am    Post subject: Reply with quote

aaronc542 wrote:

Ok, with regards to prepared statements and multi-result sets, can you please mention a database that supports this feature?


You didn't quote me, so I assume you're talking about this:

dhasenan wrote:
It isn't readily apparent that I can cache a Statement and just reset it once I'm finished with the results, and it's not clear how to get a MultiResult from a Statement.


ddbi has Statement.reset() that seems to free at least some of the bindings to the statement.

I believe mysql's C API docs say you can do something like this:
Code:

mysql_stmt_execute(query);
mysql_stmt_fetch(query); // loads first row
mysql_stmt_fetch(query); // loads second row
mysql_stmt_fetch(query); // loads third row
Back to top
View user's profile Send private message AIM Address
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 8:49 am    Post subject: Reply with quote

dhasenan wrote:
aaronc542 wrote:

Ok, with regards to prepared statements and multi-result sets, can you please mention a database that supports this feature?


You didn't quote me, so I assume you're talking about this:

dhasenan wrote:
It isn't readily apparent that I can cache a Statement and just reset it once I'm finished with the results, and it's not clear how to get a MultiResult from a Statement.


ddbi has Statement.reset() that seems to free at least some of the bindings to the statement.

I believe mysql's C API docs say you can do something like this:
Code:

mysql_stmt_execute(query);
mysql_stmt_fetch(query); // loads first row
mysql_stmt_fetch(query); // loads second row
mysql_stmt_fetch(query); // loads third row


That's not a "MultiResult" - this is just row by row fetching. MultiResult is multiple statement execution which Mysql doesn't support in the prepared statement API - I actually don't know of any DB that does support this. Do you know of one?

Btw, in this impl, prepared statements should be automatically cached and yes, you should be able to just reset it, and then call prepare again to get the cached statement. You can get around the automatic caching if you like by using doPrepare - ok, I need better documentation there, and possibly this needs to be fine tuned.

I do agree that I should be using ... variadics instead of template variadics and I will change this as soon as possible - this way it should be easy to extract IDatabase and IResult interfaces.
Back to top
View user's profile Send private message
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 10:26 am    Post subject: Reply with quote

dhasenan wrote:


Additionally, you don't allow anyone to bind anything without using strong typing, and you don't allow people to bind one argument to a Statement at a time. This means that I wouldn't be able to use Statement without incurring a huge overhead in terms of instantiated templates.


Agree too that this should be added - maybe something like setParam(size_t idx, BindType type, void*). What are you looking for in terms of not using strong typing? With a ... variadic you'd just pass the params and the function checks their types.

In database in should actually be called pushParam and not setParam. At this point don't see any way getting around needing to call these functions in the correct param order if you're doing dynamic sql gen.

Ok, the problem with fetch(...) instead of template variadics is that you have to pass param references - fetch(&id,&aField) - you can't write fetch(ref ...). That's why I used template variadics. Any problem with the fetch(&id,& aField) method instead?
Back to top
View user's profile Send private message
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 11:14 am    Post subject: Reply with quote

Ok, I still need to be convinced that Database and Result should be decoupled.

It seems that decoupling:
1) introduces a cyclic import dependency - this is how Mysql trunk is - unless you get around it with an interface
2) necessitates the repetition of roles - affectedRows and lastInsertID should be in Result and Database as a multi-result can return either a result or the num of rows affected (ddbi trunk fails to implement this correctly)
3) introduces cyclic references - Results probably will reference the Database that created them. Databases will probably also reference those results because it will either need to clean them up (you have to do this in Sqlite) or make sure they were closed (with Mysql esp when using mulit-results). This means a) more complicated resource tracking, b) fault tolerance that is more difficult to implement, c) and possibly memory leaks (if you have to keep track of all open results, you either need a list or a map that has reference to all of them which have references to you - so very few destructors will get called until the connection is closed or you call them explicitly - in the end even though each alloc is small, there are a lot of instances, and prob more work for the GC).

I also, don't see a good use case for having multiple results at a time. Could someone please present one.

Having multiple results sets open also functions differently from db to db, so you're exposing an API with inconsistent functionality. I'm in favor of graceful failures...

We could call the interface DatabaseConduit or something like that to indicate that it implements a few roles. Why not just have a DatabaseConduit detachResult() method that returns null when it's not allowed. This is a more graceful failure than throwing an exception when you can't do it.

IMHO having a single class that manages both queries and results would seem more fault tolerant in the majority of use cases. The reason to not do it this way is if you had some db's which return 2-3 different types of result structures internally - do we know of any that do?

I actually think that decoupling these two interfaces is something that needs to have strong enough use cases to justify introducing the cyclic dependency and reference requirements I've statement above. Anyway, if someone can present a db where there is not a 1-1 relationship between database connection structure and database result structure, maybe I'd be more convinced. It seems that there'd be a 1-many rel between connection and result instances in some cases, but not all and that in general it's probably a bad idea. So, if you want to convince me that these two classes should be decoupled, convince me that:
1) there are some db's with 1-many db-result structure rel's
-or-
2) that supporting 1-many rel's between db connections and results is an important enough API requirement to a) justify introducing the above complications and b) that requirement can't be adequately or better supported with the proposed detachResult method.
Back to top
View user's profile Send private message
dhasenan



Joined: 03 Feb 2005
Posts: 73
Location: New York

PostPosted: Thu Jan 29, 2009 5:41 pm    Post subject: Reply with quote

aaronc542 wrote:
Ok, I still need to be convinced that Database and Result should be decoupled.


A technical reason? I don't have one.

It's a matter of expectations. As an end user, I see a class named Database and think of it as the database. I want to write a query and get the results, so I expect to be able to call some method on the database:
Results fetch(Query);

If you renamed Database to DatabaseConnection or QueryContext or something like that, then the behavior isn't strongly associated with existing concepts, so you can make an API that better fits reality and doesn't cause needless confusion.
Back to top
View user's profile Send private message AIM Address
dhasenan



Joined: 03 Feb 2005
Posts: 73
Location: New York

PostPosted: Thu Jan 29, 2009 5:57 pm    Post subject: Reply with quote

aaronc542 wrote:
That's not a "MultiResult" - this is just row by row fetching.


Okay. The fact remains that I don't see how to accomplish this in ddbi or in your fork.
Back to top
View user's profile Send private message AIM Address
dhasenan



Joined: 03 Feb 2005
Posts: 73
Location: New York

PostPosted: Thu Jan 29, 2009 8:02 pm    Post subject: Reply with quote

Oh! I finally understand Statement now!

Statement.execute executes a query. This query can return multiple rows, but you don't actually retrieve them in this step.

Statement.fetch retrieves a row of the result.

You call Statement.fetch multiple times with different bind parameters in order to retrieve multiple rows.

Examples would be nice.
Back to top
View user's profile Send private message AIM Address
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 8:22 pm    Post subject: Reply with quote

dhasenan wrote:
Oh! I finally understand Statement now!

Statement.execute executes a query. This query can return multiple rows, but you don't actually retrieve them in this step.

Statement.fetch retrieves a row of the result.

You call Statement.fetch multiple times with different bind parameters in order to retrieve multiple rows.

Examples would be nice.


Examples would be great! And better documentation... Hopefully soon.

For the time being the unittests in dbi.model.Database are the best I have... Should give some basic ideas. But briefly, you write:


Code:
auto st = db.prepare(`SELECT "name","email", FROM "user" WHERE id = ?`, 5);
char[] name, email;
while(st.fetch(name,email)) {
  Stdout.formatln("Name:{}, Email:{}", name, email);
}


Now, changing to the bool fetch(...) variadic , that would change to:

Code:
st.fetch(&name, &email).


Slightly more ugly - but possibly more accurate - it highlights that these are reference paramenters.
Back to top
View user's profile Send private message
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 8:28 pm    Post subject: Reply with quote

dhasenan wrote:
aaronc542 wrote:
That's not a "MultiResult" - this is just row by row fetching.


Okay. The fact remains that I don't see how to accomplish this in ddbi or in your fork.


Hmm... well Mysql doesn't allow this at this point in prepared statements. Neither Sqlite or PostgreSql allow this at all AFAIK. Other db's, probably, but I don't really have experience.

But, if we wanted to add it to the API for Statement for future usage we could add bool moreResults(), bool nextResult(), and possibly bool() enabled(DBIFeature) and just always return false.
Back to top
View user's profile Send private message
aaronc542



Joined: 22 May 2007
Posts: 35
Location: NJ, USA

PostPosted: Thu Jan 29, 2009 8:45 pm    Post subject: Reply with quote

dhasenan wrote:
aaronc542 wrote:
Ok, I still need to be convinced that Database and Result should be decoupled.


A technical reason? I don't have one.

It's a matter of expectations. As an end user, I see a class named Database and think of it as the database. I want to write a query and get the results, so I expect to be able to call some method on the database:
Results fetch(Query);

If you renamed Database to DatabaseConnection or QueryContext or something like that, then the behavior isn't strongly associated with existing concepts, so you can make an API that better fits reality and doesn't cause needless confusion.


Ok, good, that's what I'm going for - something like QueryContext or DatabaseConduit, DatabaseInterface. It is a user-expectation I understand, to do Results fetch(Query), but that doesn't mean it's a good design-decision. Why did I not decouple Result and Database my implementation? - simply because I didn't see that I would gain anything and I saw that I'd have a harder time implementing it correctly with a more fault-prone API in the long run.

Alright, I admit my decision is non-conventional, but it comes from me trying to really read the DB client docs and see what is the best way to do it. I'd be open to separating Database and Result if you guys could present a really valid technically sound design argument + use cases. And I was initially planning on sticking with this Database/Result duality, but it just seemed like it made things too complicated - I would have had to add too many hidden resource management classes/interfaces to do it correctly IMHO. So, I said, hey, this is a waste of time - why put all this time in trying to have two classes to manage conn and result (because they both - as I have described have to manage both - and not just in mysql) - and what do I get for it?

Anyway, really, if there's a good reason to do it, let's do it - but if it's just convention without good design arguments, then why??? It's more work for the implementer and I don't think the user gets anything...

My proposal - stick with the one class model - but let's come up with a different name - QueryContext, DatabaseConduit, something like that...
Back to top
View user's profile Send private message
larsivi
Site Admin


Joined: 27 Mar 2004
Posts: 453
Location: Trondheim, Norway

PostPosted: Sat Jan 31, 2009 4:34 am    Post subject: Reply with quote

No matter how you try to make new abstractions, it won't change the fact that RDBMS consists of exactly 3 main concepts. A database, queries and results. Calling Database for something else (such as DatabaseConnection) changes nothing as that is exactly what we talk about on the client side. Calling it something else entirely will only confuse matters since it will be an abstraction without roots in reality
.
Aaron, I understand the issues with mysql and the coupling between the database handle and the results sets, but you are not alone in implementing this! Further, that the mysql client library requires this is a testimony to the bad design of that particular library (or just the fact that it is implemented in C), as I believe there is no such restriction in mysql server-client protocol.

If it turns out to be entirely impossible to implement Database and Result classes for any respective database system, then I may possibly accept that a single class implements both interfaces (IResult, IDatabase), but I do not accept that there is a single interface that inherits both such that such a contraption needs to be the default.

One could possibly make an argument where one says that Statement is similar to Database, and since Statement inherits Result, Database should too, but that would be flawed because Statement is a special case of Result, not the other way around.

And as with normal result sets, there is no good reason to restrict the number of prepared statements, although there are resource related reasons to reuse those objects.

As for the specifics of the available methods in the various interfaces; yes, I'm sure they need further improvement - this thread is really the first to more widely discuss the new API, and I don't expect to get it right on first try.
Back to top
View user's profile Send private message
larsivi
Site Admin


Joined: 27 Mar 2004
Posts: 453
Location: Trondheim, Norway

PostPosted: Sat Jan 31, 2009 5:09 am    Post subject: Reply with quote

aaronc542 wrote:
Ok, I still need to be convinced that Database and Result should be decoupled.


The question is really whether there is a good reason to couple them.

Quote:

It seems that decoupling:
1) introduces a cyclic import dependency - this is how Mysql trunk is - unless you get around it with an interface


Please point me to the cyclic dependencies.

Quote:

2) necessitates the repetition of roles - affectedRows and lastInsertID should be in Result and Database as a multi-result can return either a result or the num of rows affected (ddbi trunk fails to implement this correctly)


Indeed this is at least an issue with the mysql C client libraries.

Quote:

3) introduces cyclic references


How is this different from point 1? And no, there is nothing inherent in a decoupled nature that leads to cyclic references, only bad design does that.

Quote:

I also, don't see a good use case for having multiple results at a time. Could someone please present one.


Complex client side joins.

Quote:

Having multiple results sets open also functions differently from db to db, so you're exposing an API with inconsistent functionality. I'm in favor of graceful failures...


The databases do have different functionality, and so having a 100% unified API is a pipe dream - that doesn't mean that we can't have graceful failures.

Quote:

We could call the interface DatabaseConduit or something like that to indicate that it implements a few roles. Why not just have a DatabaseConduit detachResult() method that returns null when it's not allowed. This is a more graceful failure than throwing an exception when you can't do it.


Why? Exceptions is the most graceful error handling feature there is, the main issue being that it is slow. null references on the other hand easily lead to non-recoverable crashes.

Quote:

I actually think that decoupling these two interfaces is something that needs to have strong enough use cases to justify introducing the cyclic dependency and reference requirements I've statement above.


And I think that a good reason to couple them has yet to materialize.

Quote:

Anyway, if someone can present a db where there is not a 1-1 relationship between database connection structure and database result structure, maybe I'd be more convinced. It seems that there'd be a 1-many rel between connection and result instances in some cases, but not all and that in general it's probably a bad idea. So, if you want to convince me that these two classes should be decoupled, convince me that:
1) there are some db's with 1-many db-result structure rel's
-or-
2) that supporting 1-many rel's between db connections and results is an important enough API requirement to a) justify introducing the above complications and b) that requirement can't be adequately or better supported with the proposed detachResult method.


detachResult is not a good compromise because it means you need to have the decoupled version there in any case, but only invoking in on request. In any case, going away from decoupled interfaces needs good reasons, not the opposite.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DDBI All times are GMT - 6 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group