FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

std.math
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic     Forum Index -> Ares
View previous topic :: View next topic  
Author Message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Thu Feb 02, 2006 11:13 pm    Post subject: Reply with quote

kris wrote:
sean wrote:
Done and done. The intrinsics now have inline asm implementations and I've altered the rounding functions as described above.

Interesting! Does the compiler actually inline them optimally?

Not really Smile I think the presence of an asm block prevents inlining, though it would be nice if inlining were allowed so long as the asm code didn't do any explicit register manipulation. Also, the functions don't use naked asm, so there's a bit of extra wrapper code in the function itself that could be done away with. That said, here's a quick comparison. Using this source file:
Code:
import std.math;

void main()
{
    real x = 10.0;
    x = sqrt( x );
    printf( "?Lf\n", x );
}

Building against Phobos with -release -inline set yields this:
Code:
__Dmain comdat
        assume  CS:__Dmain
L0:             enter   0Ch,0
                fld     tbyte ptr FLAT:_DATA[00h]
                fstp    tbyte ptr -0Ch[EBP]
                fld     tbyte ptr -0Ch[EBP]
                fsqrt
                fstp    tbyte ptr -0Ch[EBP]
                push    dword ptr -4[EBP]
                push    dword ptr -8[EBP]
                push    dword ptr -0Ch[EBP]
                push    offset FLAT:_DATA[0Ch]
                call    near ptr _printf
                add     ESP,010h
                leave
                ret
__Dmain ends

However, building against Ares (and importing std.math.core) with the same options yields this:
Code:
__Dmain comdat
        assume  CS:__Dmain
L0:             enter   0Ch,0
                fld     tbyte ptr FLAT:_DATA[00h]
                fstp    tbyte ptr -0Ch[EBP]
                push    dword ptr -4[EBP]
                push    dword ptr -8[EBP]
                push    dword ptr -0Ch[EBP]
                call    near ptr _D3std4math4core4sqrtFeZe
                fstp    tbyte ptr -0Ch[EBP]
                push    dword ptr -4[EBP]
                push    dword ptr -8[EBP]
                push    dword ptr -0Ch[EBP]
                push    offset FLAT:_DATA[0Ch]
                call    near ptr _printf
                add     ESP,010h
                leave
                ret
__Dmain ends

With this as the approximate code generated for sqrt:
Code:
_D4test4sqrtFeZe        comdat
        assume  CS:_D4test4sqrtFeZe
                push    EBP
                mov     EBP,ESP
                fld     tbyte ptr 8[EBP]
                fsqrt
                fstp    tbyte ptr 8[EBP]
                fld     tbyte ptr 8[EBP]
                pop     EBP
                ret     0Ch
_D4test4sqrtFeZe        ends

So the intrinsic version is basically just straight inlined assembler, while the other requires a jump and at least a bit of stack manipulation to deal with the parameter passing. I imagine that this is still better than calling a C routine, but the intrinsic is obviously still a better choice if performance is critical. I may still go back and make all those asm block naked, but I'm trying to avoid confusing the casual reader any more than necessary :p

[edit]

I got the sqrt function down to this by trimming out a few lines. I didn't realize the float stack is used for return passing Smile:
Code:
_D4test4sqrtFeZe        comdat
        assume  CS:_D4test4sqrtFeZe
                push    EBP
                mov     EBP,ESP
                fld     tbyte ptr 8[EBP]
                fsqrt
                pop     EBP
                ret     0Ch
_D4test4sqrtFeZe        ends
Back to top
View user's profile Send private message
Don Clugston



Joined: 05 Oct 2005
Posts: 91
Location: Germany (expat Australian)

PostPosted: Mon Feb 06, 2006 2:10 am    Post subject: FYI: Naked floating point Reply with quote

Code:
// An example of a naked asm floating-point function.
real sin(real x)
{
    asm {
        naked;
        fld real ptr [ESP+4];
        fsin;
        ret x.sizeof + x.alignof;
    }
}


This works for DMD-Windows, but I'm not sure if it's correct for Linux.
Would be better to get Walter to make it intrinsic, of course.

(I'd also like to see a few more intrinsics, such as rot and fsincos).
Back to top
View user's profile Send private message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Mon Feb 06, 2006 1:09 pm    Post subject: Reply with quote

This is the current implementation:
Code:
real sin(real x) /* intrinsic */
{
    version(D_InlineAsm_X86)
    {
        asm
        {
            fld x;
            fsin;
        }
    }
    else
    {
        return std.c.math.sinl(x);
    }
}

Letting the compiler sort out stack issues should avoid alignment problems, no?

However, I'll test-build the naked version and see what the difference in code generation is. If it allows the call to be inlined then it's definately better.

[edit]

Looks like neither version is inlined with -inline specified, but your version obviously compiles to fewer instructions within the call itself. The naked version would need to be modified for 64-bit machines, but perhaps it's worthwhile to use anyway?
Back to top
View user's profile Send private message
Don Clugston



Joined: 05 Oct 2005
Posts: 91
Location: Germany (expat Australian)

PostPosted: Tue Feb 07, 2006 1:45 am    Post subject: Reply with quote

I just noticed that the Phobos docs for std.intrinsic have been updated with the latest release to include the functions from std.math (eg, fabs, sin, etc). But the file itself is unchanged. Perhaps Walter is going to put them in there in the next release.

Here's a potentially-intrinsic function for std.math.ieee.
Many functions in std.math.core will use it for calculations involving complex numbers. (But, for non-x86, there may be a problem with the use of sin() and cos() in std.math.ieee -- aargh).

/*************************************
* Calculate cos(y) + i sin(y).
*
* On x86 CPUs, this is a very efficient operation;
* almost twice as fast as calculating sin(y) and cos(y)
* seperately, and is the preferred method when both are required.
*/
creal fcis(ireal y)
{
version(D_InlineAsm_X86) {
asm {
naked;
fld real ptr [esp+4];
fsincos;
fxch st(1), st(0);
ret y.sizeof + y.alignof;
}
} else {
return cos(y.im) + sin(y.im)*1i;
}
}

unittest {
assert(fcis(1.3e5Li)==cos(1.3e5L)+sin(1.3e5L)*1i);
assert(fcis(0.0Li)==1L+0.0Li);
}
Back to top
View user's profile Send private message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Thu Feb 16, 2006 4:32 pm    Post subject: Reply with quote

This will be fine so long as the math modules don't all have static ctors (which seems unlikely). Will there be any alignment issues on non-Windows platforms? I tried "fld y" instead of "naked; fld real ptr [esp+4]" and got an access violation so I'm leaving it as provided for now.
Back to top
View user's profile Send private message
Don Clugston



Joined: 05 Oct 2005
Posts: 91
Location: Germany (expat Australian)

PostPosted: Tue Feb 21, 2006 1:58 am    Post subject: Reply with quote

Quote:
Will there be any alignment issues on non-Windows platforms?


It should work fine on Linux (on 32 bits, anyway -- the [esp+4] might need to be [esp+8] on x86-64).

Quote:
I tried "fld y" instead of "naked; fld real ptr [esp+4]" and got an access violation so I'm leaving it as provided for now.


You'd also need to remove the ret instruction.
Back to top
View user's profile Send private message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Tue Feb 21, 2006 10:28 am    Post subject: Reply with quote

Don Clugston wrote:
You'd also need to remove the ret instruction.
Doh! Must have been a long day that day.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Ares All times are GMT - 6 Hours
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group