FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Some potential changes

 
Post new topic   Reply to topic     Forum Index -> MultiArray
View previous topic :: View next topic  
Author Message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Wed Oct 31, 2007 9:44 pm    Post subject: Some potential changes Reply with quote

I'm thinking of making two biggish changes to the library that will break existing code. I'm thinking that I'm probably still the only one who has any pre-existing code at this point, so I don't think this will really affect anyone, but I thought I'd check here first.

Struct instead of class
I find myself constantly getting bitten by forgetting to do the new NDArray() thing. The zeros() and ones() functions are supposed to help there a little bit, but it's trying too hard to be NumPy, which it's not. Using a struct instead of class in the implementation would make ndarray feel more like builtin D dynamic arrays, which also act more or less like structs.

The down side would be that they might be kind of heavy-weight to pass around to functions as structs. data dyn array, shape dyn array, strides dyn array and flags is a lot. Dynamic arrays are 8 bytes on the stack? So 28 bytes total (assuming a 4-byte alignment)? So you'll probably want to pass these babys around by reference, unlike a simple D dynamic array.

[Edit - I went ahead and did this. It turned out to be relatively easy.]

Compile-time fixed dimensions
The current design allows you to modify the number of dimensions of the current array on the fly. This was the only sensible option for NumPy in Python, but in a statically compiled language like D it doesn't make as much sense. One place where this hurts in in the opIindex type routines. In Python the same indexing routine can return another array or a scalar as needed. In D, opIndex can't vary its return type unless the arguments are also varying. You'd like to have your opIndex return a scalar when handed N indices and an array when handed less than N. You can't do that without knowing N at compile time.

It shouldn't make much performance difference. If you want to treat the data as 2 dimensional instead of 3 temporarily you just have to create a new view. You still don't have to actually duplicate the data.

[Edit - I started trying to do this, but it turned into a nightmare. It pretty much requires rethinking and rewriting every bit of code. And the question of where to draw the line between static and dynamic is hard to decide. Flags like Fortran vs Contiguous might be good to make compile-time too (And that would put it more in line with other C++ matrix libs like MTL). You could maybe get rid of strides member then. But then things like transpose require creating a new objects with the flag changed. And without strides you can't do some nice things, like a cheap view of a column. So maybe you make two types, one with strides and one without, but then if you're going to go through all the effort to do that stuff, it's probably worth considering handling storage abstractions more generally even, including sparse storage schemes etc. I just don't have time for that now.]


Summary
So these are two relatively conservative but major changes that I plan on making fairly soon. More outlandish things like overhauling the design to have more flexible storage concepts would be cool, but are not on the agenda for now. I just don't have time for it.
Back to top
View user's profile Send private message
schani



Joined: 18 Sep 2007
Posts: 25
Location: Vienna, Austria

PostPosted: Mon Nov 05, 2007 2:43 am    Post subject: Reply with quote

Well, I actually did some coding using multiarray... Smile I didn't use the new version yet, so I can't give feedback on the changes.

What I programmed using multiarray was a solver for a coupled set of two differential equations. It turned out, that the problem crucially depended on the floating point resolution, so I implemented an iterative improvement.
The main issue was that linsolve() always generates a new object, which resulted in unnecessary allocation and GC activity. Further, the repeated calling of linsolve, that always creates an LU-Factorization of the matrix, for the iterative improvement was quite a waste of computation time.

So my suggestions are:

  • The Lapack routines should become translated to a multiarray version. I am talking about calling getrf and the other commands with only the matrix or the respective objects as parameter. In fact, this is what I dreamed of after the first use of Lapack.

  • Maybe it would be a good idea to create some easier to understand functions or aliases for the people not familiar with Lapack. (like linsolve is, but for the other functions)

  • linsolve could then be reimplemented using these methods and equipped with an either complie time or runtime controlled improvement. I don't know, which would be the better version... Determining at compile time, that n improvement steps have to be taken, or give an error computation method that controls improvement at runtime.


That's it for now... I could help with implementing these suggestions, if you are interested.

Multiarray (at least the version I am familiar with Smile ) is a great easy to use library! Keep up the good work!
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Mon Nov 05, 2007 2:50 am    Post subject: Reply with quote

Your suggestions sound pretty good. If you could write the sort of lightweight lapack wrapping routine you're interested in having then I'd be happy to include it.

I tried to make it possible to avoid some reallocation, in that you can provide a pre-allocated 'ret' parameter for the result. But yeh, it does make a copy of A and b internally IIRC so that they don't get messed up. The full set of options makes for kind of a messy interface.
Back to top
View user's profile Send private message
schani



Joined: 18 Sep 2007
Posts: 25
Location: Vienna, Austria

PostPosted: Mon Nov 05, 2007 4:48 am    Post subject: Reply with quote

Just for the sake of completeness... My code was quite easy to transfer to the new version... although the a.get(ii,jj) kind of hurts my eyes.

Is it really necessary to sacrifice a good looking element acces for a good looking slice access?

I myself do not use slice assignment that often, so I would prefer the opIndex and opIndexAssign to return an element and that the slice things are moved to methods, like a.sub, or a.view or things like that.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Mon Nov 05, 2007 2:42 pm    Post subject: Reply with quote

schani wrote:
Just for the sake of completeness... My code was quite easy to transfer to the new version... although the a.get(ii,jj) kind of hurts my eyes.

Is it really necessary to sacrifice a good looking element acces for a good looking slice access?

I myself do not use slice assignment that often, so I would prefer the opIndex and opIndexAssign to return an element and that the slice things are moved to methods, like a.sub, or a.view or things like that.


That's a good suggestion. Slicing is quite common in numpy, but I also was surprised by the number of places in my code where I had to change to .get. I don't like it either. I was hoping someone might respond to my post in the newsgroup about the opCall/static opCall problem with a good workaround, but no one did.

For slicing yeh, .view sounds nice.
Back to top
View user's profile Send private message
schani



Joined: 18 Sep 2007
Posts: 25
Location: Vienna, Austria

PostPosted: Mon Nov 05, 2007 4:41 pm    Post subject: Reply with quote

baxissimo wrote:
Slicing is quite common in numpy, but I also was surprised by the number of places in my code where I had to change to .get. I don't like it either.

I think changing it would make the code more beautiful... But only until D supports an appropriate slice overloading.
baxissimo wrote:
I was hoping someone might respond to my post in the newsgroup about the opCall/static opCall problem with a good workaround, but no one did.

For slicing yeh, .view sounds nice.

Well, that issue is even described in the language specification. An external function could be used to initialize the struct. The problem with that is the unnecessary copies in this method. Maybe the use of some mixed in initializer-template could solve that problem?
What about a template mixin? I still don't have enough knowledge of D, I assume.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Mon Nov 05, 2007 5:03 pm    Post subject: Reply with quote

I'll try to make the change today (in the next few hours, actually)

schani wrote:

Well, that issue is even described in the language specification. An external function could be used to initialize the struct. The problem with that is the unnecessary copies in this method. Maybe the use of some mixed in initializer-template could solve that problem?
What about a template mixin? I still don't have enough knowledge of D, I assume.


The copy shouldn't be a problem for DMD at least since it implements the "NRVO" (named return value optimization). Namely, if you write
Code:
S x = make_an_S();
then DMD will construct the result directly in x's memory rather than copying. Supposedly anyway. Smile
Back to top
View user's profile Send private message
schani



Joined: 18 Sep 2007
Posts: 25
Location: Vienna, Austria

PostPosted: Wed Nov 07, 2007 2:00 am    Post subject: Reply with quote

Well, so what stands against an external initializer function?
The only thing is to give it a name that doesn't hurt code beauty too much.
Code:
ndarray x = new_ndarray!(double)(3,50);

or
Code:
ndarray x = Ndarray!(double)(3,50);

or
Code:
ndarray x = _ndarray!(double)(3,50);

Another variant that comes into my mind is using a static init method like
Code:
ndarray x = ndarray!(double).init(3,50);


Anyway, I think sacrificing the dynamic call operator to a standardized initialization is better than that.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Wed Nov 07, 2007 2:36 am    Post subject: Reply with quote

schani wrote:
Well, so what stands against an external initializer function?
The only thing is to give it a name that doesn't hurt code beauty too much.
Code:
ndarray x = new_ndarray!(double)(3,50);

or
Code:
ndarray x = Ndarray!(double)(3,50);

or
Code:
ndarray x = _ndarray!(double)(3,50);

Another variant that comes into my mind is using a static init method like
Code:
ndarray x = ndarray!(double).init(3,50);


Anyway, I think sacrificing the dynamic call operator to a standardized initialization is better than that.


The only down side is that it is rather an entrenched convention that if you have a struct called Foo that you'll be able to construct it using Foo(). Calling the "constructors" something different just leads to more confusion. But it seem like it might be the best option in this case.

Actually there are already such external constructors: ones([3,4]) zeros([3,4]) and empty([3,4]). These construct arrays initialized to all 1's all 0's, or uninitalized memory.

--bb
Back to top
View user's profile Send private message
schani



Joined: 18 Sep 2007
Posts: 25
Location: Vienna, Austria

PostPosted: Wed Nov 07, 2007 2:52 am    Post subject: Reply with quote

baxissimo wrote:

Actually there are already such external constructors: ones([3,4]) zeros([3,4]) and empty([3,4]). These construct arrays initialized to all 1's all 0's, or uninitalized memory.

--bb


Oh, I didn't know of them. Looking at them, I think they're the best option, as you actually know what initialization happens to your array when doing it this way.
How about making them static members of ndarray and drop the static opCall?

Code:
ndarray x = ndarray!(double).empty([3,50]);

looks nice.
In addition, a template initializer could be given, that takes a given function to initialize the array, if someone needs that.

Code:
double my_sine(x,y){
    return sin(x/100+y/10);
}
ndarray x = ndarray!(double).init!(my_sine)(3,50);


You could do a poll in this forum on that topic Smile
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Sat Nov 10, 2007 3:18 pm    Post subject: Reply with quote

schani wrote:
Just for the sake of completeness... My code was quite easy to transfer to the new version... although the a.get(ii,jj) kind of hurts my eyes.

Is it really necessary to sacrifice a good looking element acces for a good looking slice access?

I myself do not use slice assignment that often, so I would prefer the opIndex and opIndexAssign to return an element and that the slice things are moved to methods, like a.sub, or a.view or things like that.


There is a problem with this actually. The problem is that opIndexAssign and opSliceAssign are the only way in D to achieve efficient setting of elements with indexing-like syntax. You can't make an opCall like foo(2,3) = 5 be that efficient. You can try to make it work by returning a proxy struct of some sort but then
Code:
float x = foo(2,3)
will no longer work because of no opImplicitCast. So if you don't use opIndex then the syntax has to be something ugly like set_elem(value, 2,3), or same for simple slicing if you use opIndex/Assign for element-wise manipulation.

Ok, but maybe it's still a good trade off. Actually slicing always returns a struct that references the actual data indirectly, so that means you can treat the return as an Lvalue, and do assignment to it etc. So the down sides of using a regular member function are less.

I guess I'm convinced. foo[1,2,3] should return an element for a 3d array and using any other number of indices should be a runtime error. To get a slice, you can use foo(1) or the horrifying foo([1,0,0]..[2,N,M]). And to construct you use static construction functions like ones,zeros, and empty. I like the standalone versions, so I'm not going to get rid of those, but I'll add the static members too.

It may take me a while to make those changes. It's going to break a lot of code again.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> MultiArray All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group