Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #981 (closed enhancement: wontfix)

Opened 7 years ago

Last modified 6 years ago

Optimization for array access [Patch]

Reported by: CyberShadow Assigned to: sean
Priority: major Milestone: 0.99.7
Component: Core Functionality Version: 0.99.5 Jascha
Keywords: Cc: thecybershadow@gmail.com

Description

Sample program 1:

void main()
{
	uint[] arr;
	for(int i=0;i<5000000;i++)
		arr ~= i;
}

Sample program 2:

void main()
{
	uint[] arr;
	for(int i=0;i<5000000;i++)
		arr.length = i;
}

The following two programs are much slower when compiled with Tango than with Phobos. The reason for this is the implementation of the array append / set length routines (lifetime.d). Phobos uses gc.capacity() to get the capacity, which is cached, while Tango uses gc_query, which not only doesn't cache the result (this can be idea of another improvement), but does three lookups. This is justified by the change of the GC behavior in Tango (maintain bits when reallocating), however this does cause a considerate slowdown.

I'm attaching a patch which considerably speeds up the testcase programs, and even makes them run faster than Phobos (also patched with the respective optimization). The change will only make the abovementioned routines query the GC if the expanded array is larger than a page, and the change does not require allocating a new page.

Attachments

tango.patch (3.2 kB) - added by CyberShadow on 04/19/08 16:11:03.
Proposed patch (v2)

Change History

03/14/08 03:09:01 changed by CyberShadow

I tested the patch with a personal >10000-line (not counting libraries) project, and haven't noticed any ill side-effects.

Hm, I somewhat misnamed the issue - it should be "Optimization for array growth" or something similar.

03/14/08 13:43:16 changed by sean

  • status changed from new to assigned.

Nice work. I'll have to give this one some thought though, because the speed advantage is coming from knowledge about how the GC works internally, and I'm hesitant to do that in the compiler runtime. But perhaps there's a way to streamline things in a similar way... maybe query the GC once on startup for relevant info rather than use hard-coded constants or something like that. I know that there is a bit of other code in the runtime that uses PAGESIZSE as well, but I'm planning on looking at that at some point as well.

04/19/08 16:11:03 changed by CyberShadow

  • attachment tango.patch added.

Proposed patch (v2)

04/19/08 16:11:44 changed by CyberShadow

There was an off-by-one error in the original patch. I've uploaded the fixed version. Sorry about that.

04/27/08 05:10:29 changed by larsivi

  • milestone changed from 0.99.6 to 0.99.7.

05/03/08 19:24:24 changed by sean

  • status changed from assigned to closed.
  • resolution set to wontfix.

Hm... another issue with this patch is that the current implementation detects whether the pointer being resized is at the head of a memory block or is an internal pointer. The issue being that pointers to the interior of a block are not eligible for in-place resizing. However, he suggested patch sidesteps this checking and always resizes in place for large blocks. So while I'd love to apply the patch, I can't do so because it would cause incorrect behavior in certain cases.