Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #371 (closed wishlist: fixed)

Opened 17 years ago

Last modified 14 years ago

get the number of cores/CPUs

Reported by: CyberShadow Assigned to: kris
Priority: minor Milestone: 0.99.9
Component: Tango Version:
Keywords: triage Cc: daniel.keep+tango@gmail.com

Description

Currently there isn't a standard way to get the number of cores/CPUs the system has. This is often used in CPU-intensive applications as a hint to the number of worker threads to create.

On Windows, this is done using the GetSystemInfo? function (the dwNumberOfProcessors field of the SYSTEM_INFO structure).

Attachments

Cpu.d (11.6 kB) - added by DRK on 06/13/07 11:27:54.
Rough multi-core cpuid implementation (uses Phobos)
cpuid.d (10.0 kB) - added by mandel on 07/12/08 14:57:33.
cpuid.d from DMD 1.032 release, made Tango compatible by Jarrett
cache.d (18.4 kB) - added by Don Clugston on 09/18/08 08:15:34.
Cpuid.d (29.0 kB) - added by fawzi on 01/30/09 15:34:31.
first attempt at a Cpuid file

Change History

04/04/07 09:08:48 changed by larsivi

  • type changed from enhancement to wishlist.

I suppose the cpuid module in Phobos may do what you want and more? Not sure at all how to expose it though - not too happy with the Phobos approach.

04/04/07 13:20:17 changed by Don Clugston

There's also a problem with Phobos' method for determining if SSE is supported; you cannot rely on cpuid alone; the OS has to specifically enable the XMM registers. To get around this, use FXSAVE and FXRSTOR to try to change the value of a register. If the value is changed, it is not safe to use SSE, even though the CPU supports it.

// Something like this (not tested)!

byte savebuf[512]; // IMPORTANT: must be align 16!

// Probably better to use an array of longs.

asm {

fxsave savebuff;

} byte old = ++savebuff[0xA0]; // try to change XMM0. asm {

fxrstor savebuff; fxsave savebuff;

} if (savebuff[0xA0]!=old) { // OS doesn't support SSE. } else { // Restore the value of XMM0.

--savebuff[0xA0];

asm {

fxrstor savebuff;

}

}

04/05/07 01:24:51 changed by kris

  • owner changed from kris to sean.
  • milestone set to 0.98 RC 2.

05/05/07 06:01:27 changed by sean

  • milestone changed from 0.98 RC 2 to 0.99 RC3.

06/05/07 16:10:18 changed by DRK

  • cc set to daniel.keep+tango@gmail.com.

I've been looking at this, and I think I've worked out what needs to be done (at least under Win32). Basically, I've been looking at creating a tango.sys.Cpu module that contains two things: a "LogicalCpus?" array (or something similar) of Cpu objects and a "CommonFeatures?" Cpu object.

Basically, however the tests are done (most likely using cpuid since the Win32 api seems horribly fragmented and semi unreliable), a thread needs to be created for each logical processor on the system, then forced to run on it. The number of threads we create tells us how many logical processors (aka: hardware threads) are on the system. We also might be able to work out the number of physical processors (although I haven't quite figured out how yet).

The "CommonFeatures?" object is basically the intersection of the feature sets of the individual logical processors. This means that if you check CommonFeatures?.sse2, you can just assume that each processor supports SSE2 without having to check each one individually.

Each Cpu object itself contains a bunch of read-only properties that basically map to what std.cpuid provides. I can't imagine why anyone would really care about whether an individual processor supports SSE2 or not, but since we have to figure that out anyway, we may as well make it available...

So that's the number of hardware threads, and a list of what features are supported across the system. I have no idea how you would do this on a posix system.

One other thing: having this information is all well and good, but being able to create threads that run on a particular CPU would be really cool. I was told that thread pools were being worked on; maybe there's some overlap that needs to be thought about (eg: Cpu providing a method to create a thread that runs on that particular CPU).

Anyway, I'm going to continue playing around with getting this working, and I'm CCing myself to this ticket. I'll add a comment once I get it working.

06/13/07 11:26:00 changed by DRK

Here's a very rough program that does CPUID on each of the available logical processors on the system (yes, it uses Phobos. It was just quicker that way :P). I think what needs to be worked out is what the API for all of this will actually look like. I can see a few things that should be possible:

  1. Get number of logical processors,
  2. set which logical processor(s) a thread can run on and
  3. get feature flags for the available logical processors.

06/13/07 11:27:54 changed by DRK

  • attachment Cpu.d added.

Rough multi-core cpuid implementation (uses Phobos)

07/03/07 02:44:08 changed by kris

  • milestone changed from 0.99 RC3 to 1.0.

This has to be delayed again

05/30/08 17:28:50 changed by larsivi

  • keywords set to triage.

07/12/08 14:57:33 changed by mandel

  • attachment cpuid.d added.

cpuid.d from DMD 1.032 release, made Tango compatible by Jarrett

07/12/08 15:34:42 changed by JarrettBillingsley

toString should look like this:

    return Format(
        "Vendor string:    {}\n"
		"Processor string: {}\n"
        "Signature:        Family={} Model={} Stepping={}\n"
        "Features:         {}\n"
        "Multithreading:   {} threads / {} cores\n", vendor, processor, family, model, stepping, feats, threadsPerCPU, coresPerCPU);
    }

07/21/08 12:14:33 changed by mandel

We should add static functions: (similar to class Thread)

class Cpu
{
   static Cpu getThis();
   static Cpu[] getAll();
}

and also add a UnknownCpu? class.

07/21/08 13:54:33 changed by mandel

Do we need a class based approach at all?

Are inhomogeneous processor environments in the scope of a single binary realistic?

If not, then we can pretty much copy&paste from Phobos (free standing functions, static this). The only function that might be lacking is a way to get the total number of cpus.

Are there use cases which rely on the class based approach?

Another issue is that the implementation doesn't support other architectures except x86. I don't think that it should be a blocker. It probably won't support all processors Tango is used on in a foreseeable future. If that's the blocker, this probably won't ever included.

09/18/08 07:49:24 changed by Don Clugston

I recently posted an enhanced cpuid module which includes cache info. I suggest that: * the low-level stuff belongs in the run-time. Although the source file's fairly big, it compiles away to almost nothing, and has no dependencies. Does not use the gc, for example. This should stay as free-standing functions and static this.

* the high-level bit (basically the 'toString' function) should exist only in user space -- if it exists at all. * All other functions are only of interest to low-level programmers. For example, I want to use them for the BigInt? asm functions; the cache info would be useful for D code for matrix operations.

For machines other than X86 and Itanium, it will almost certainly be necessary to make system calls to determine cache sizes etc; I don't know if the runtime (for PPC, say) will actually need that information or not. So there is some argument for a user-space module in addition to the runtime one.

09/18/08 08:15:34 changed by Don Clugston

  • attachment cache.d added.

10/10/08 14:38:22 changed by Don Clugston

Since I needed this functionality for Bigint, I've included my code in tango.math.internal.Cache. It should probably be renamed/moved to tango.core.Cpuid.

By the way, the docs for CPUID state that the BIOS is responsible for making sure that all of the processors report the same number for total CPUs. So this should work on a multi-CPU system. It's pretty hard to believe you'd have different CPU models in the one PC, though.

I also suggest a separate module (SystemInfo?.d ?) which uses this code to create an identification string; the same module could also determine RAM, OS version, etc.

Here's sample code (using printf) which is typical of what you'd use in SystemInfo?.

import tango.math.internal.Cache;
void main()
{
	char[] feats;
	char[] feats2;
	if (x87onChip)            feats ~= "X87 ";
	if (mmx)			feats ~= "MMX ";
	if (sse)			feats ~= "SSE ";
	if (sse2)			feats ~= "SSE2 ";
	if (sse3)			feats ~= "SSE3 ";
	if (ssse3)			feats ~= "SSSE3 ";
	if (sse41)			feats ~= "SSE4.1 ";
	if (sse42)			feats ~= "SSE4.2 ";
	if (amd3dnow)       feats ~= "3DNow! ";
	if (amd3dnowExt)	feats ~= "3DNow!+ ";
	if (amdMmx)			feats ~= "MMX+ ";
	if (hyperThreading)	feats ~= "HTT";
	if (isX86_64)       feats ~= "X86-64 ";
	if (hasCmov)        feats2 ~= "CMOV ";
	if (hasRdtsc)		feats2 ~= "RDTSC ";
	if (hasFxsr)		feats2 ~= "FXSR ";
	if (hasCmpxchg8b)	feats2 ~= "CMPXCHG8B ";
	if (hasCmpxchg16b)	feats2 ~= "CMPXCHG16B ";
	if (hasPopcnt)		feats2 ~= "POPCNT ";
	
	// NOTE: Family, model, and stepping should always be displayed in hex.
    printf("Vendor string:    %.*s\nProcessor string: %.*s\n"
    "Signature:        Family = %X Model = %X Stepping = %X\n"
    "Features:         %.*s\n"
    "                  %.*s\n"
    "Multithreading:   %d cores / %d threads\n"
    , vendor, processor,
		     family, model, stepping,
		feats, feats2, coresPerCPU, threadsPerCPU);
	printf("Data caches per CPU:\n");
	for (int i=0; i<numCacheLevels; ++i) {
		printf("L%d ways = %d linesize = %d size = %dK\n", i+1, datacache[i].associativity, datacache[i].lineSize, datacache[i].size);
	}
}

10/10/08 15:05:21 changed by kris

Wonderful!

I agree it should be in tango.core

11/20/08 13:21:17 changed by mandel

*bump*

01/17/09 16:41:39 changed by larsivi

  • owner changed from sean to fawzi.

01/20/09 16:23:59 changed by fawzi

I find a class based approach better, to keep it simple one could maybe just have a single CPU type, and get the class for that through

CPU.main

Having different processors as different classes seems just much cleaner.

One could add .x86,.sparc,... properties to the base class to it to make the cast+ check for non nil more concise.

Then mmx would become

CPU.main.x86.mmx

which I think is still reasonable

Should a multiple CPU kind hardware come on which tango runs, then one can think how to cope with it... (and a class approach can probably be extended to it more cleanly, in a similar way to the Cpu.d example).

I will try to make something in that direction, comments, naming suggestions are welcome.

01/30/09 15:34:31 changed by fawzi

  • attachment Cpuid.d added.

first attempt at a Cpuid file

01/30/09 15:47:21 changed by fawzi

I have attached my first attempt at a Cpuid for core.

I actually stopped working on it because I wanted to have more topology information, and that did need system information, and that did need more constants and config, so I ended up working on auto config using precompiler...

Anyway the module can be useful to see the approach I propose and comment on it. One design choice that I am not too sure about is not allowing to query things about CPU architectures that are not active, i.e. on PPC you cannot query x86.ssl3, it is disallowed at compile time.

The numa module is coming along quite well and I am thinking to get more info about cache,... form the OS when possible. expect an update soonish ;)

If one wants to already read some literature this is what I found useful:

/// - processor and api report:
///      http://www.halssoftware.com/reports/technical/procmem/ProcMemReport_download
///   a good overview of the various apis on the various OS (but not so much osX)
/// - Linux (novell) numa api:
///      http://www.novell.com/collateral/4621437/4621437.pdf
///   good overview of the modern linux NUMA api, available in the new distributions
/// - Portable Linux Processor Affinity (PLPA):
///      http://www.open-mpi.org/projects/plpa/
///   a nice library that I use to get affinity & more working on linux distribtions even without libnuma
/// - OSX 10.5 thread affinity
///      http://developer.apple.com/releasenotes/Performance/RN-AffinityAPI/
///   a good atarting point for these issues on macosX 10.5
/// - Windows numa resources:
///      http://www.microsoft.com/whdc/archive/numa_isv.mspx
///   an intoductive article about it
///      http://msdn.microsoft.com/en-us/library/aa363804.aspx
///   Windows numa API
/// - Opensolaris topology representation
///      http://opensolaris.org/os/community/performance/mpo_overview.pdf
///   the topology representation that did inspire the current interface

03/04/09 22:31:36 changed by fawzi

(In [4375]) first version of user visible Cpuid, refs #371

03/04/09 22:52:58 changed by DRK

Why do the 'x86', 'ppc', etc. members exist? You know at compile time what the architecture of the system is. Since only one is defined for a given architecture, you have to protect the using code via version or static if anyway.

I just can't see any reason for this extra level of indirection to exist, since it's always the SAME indirection.

03/05/09 09:08:48 changed by fawzi

Well the idea was that you might want to describe foreign CPU types even if you are not running on them (for example if you have a daughter board with a cell processor, or something like that).

For this it is nice to use exactly the same structures.

Thus the indirection (which should be optimized away), but you are right that as for now there are no mixed architectures, one can simply define the type of mainCpu to be the correct type.

For the coding point of view it would be easier to "forget" that you are using something x86 specific, and create an error that will be seen only on other architectures.

So I like this extra distinction between "general to all cpus" and architecture specific, but if there is a strong bias against it I might reconsider it.

11/29/09 13:26:47 changed by fawzi

  • owner changed from fawzi to kris.

01/14/10 05:52:08 changed by kris

  • status changed from new to assigned.

[5296] moved Don & lindquists math.internal.Cache to core.tools.Cpuid

01/14/10 05:52:28 changed by kris

  • status changed from assigned to closed.
  • resolution set to fixed.
  • milestone changed from 1.0 to 0.99.9.