Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #1906 (new defect)

Opened 14 years ago

Last modified 14 years ago

Process always blocks on 64-bit linux

Reported by: DRK Assigned to: larsivi
Priority: major Milestone: 1.0
Component: Tango Version: 0.99.9 Kai
Keywords: Cc:

Description

Given the following test program:

module subproc;

import tango.io.Stdout;
import tango.sys.Process;
import tango.util.Convert : to;

void main()
{
    {
        Stderr.formatln("++ create Process");
        scope p = new Process(true, "echo TEST");
        p.copyEnv = true;
        Stderr.formatln("++ set redirect");
        p.redirect = Redirect.None;
        Stderr.formatln("++ execute");
        p.execute;
    
        Stderr.formatln("++ wait");
        auto pr = p.wait;

        Stderr.formatln("++ checking result");
    
        if( pr.reason != Process.Result.Exit )
            throw new Exception("could not invoke gcc");
    
        else if( pr.status )
            throw new Exception("gcc failed with exit code "~
                    to!(char[])(pr.status));
        
        Stderr.formatln("++ finished");
    }
}

The subprocess starts correctly, produces output, but never actually terminates. Examining the system monitor reveals that the child has become a Zombie with the 'waiting channel' listed as 'do_exit'. The parent, meanwhile, is consuming 0% CPU and is waiting on 'wait_pipe'. The child cannot be directly killed; the parent program has to be killed or interrupted.

Running the unit test in Process.d also fails to ever terminate for, apparently, the same reason.

Note that this doesn't happen with my Windows 7 64-bit install.

This was tested on Ubuntu 9.10, 2.6.31-14-generic kernel, x86_64 running inside VirtualBox? 3.1.2 r56127.

Change History

04/15/10 05:47:51 changed by DRK

Additional clarification: code was compiled with LDC 0.9.2.

04/15/10 10:09:51 changed by schveiguy

  • status changed from new to assigned.

I don't know how much I can work on this, I don't think I have any 64-bit linux installs at my disposal, my current machine is 32-bit processor. I'm assuming you are using a 64-bit compiler?

04/15/10 11:10:47 changed by schveiguy

Given that the parent process is in wait_pipe, it probably is waiting on a special pipe Tango creates to aid in diganosing problems.

Essentially, Tango's Process will create an extra pipe so if exec fails, the child can tell the parent the error code. The child sets the "close on exec" flag, so if the exec succeeds (which appears to be happening), the pipe should be closed. The parent will then get 0 bytes back on its read of the pipe, and continue. However, it sounds like that's not happening. Maybe there is a fundamental difference in 64-bit linux in one of the syscalls. There are a couple options you can try.

1) try to remove that whole handshake from the parent process (comment out Process.d line that starts with pexec.source.input.read) and see if this fixes the problem. This is just to diagnose if my theory of where the problem lies is correct. 2) Assuming option 1 shows that the exec pipe is the problem, try to build a similar program with C or C++ to rule out some kind of weirdness by the compiler. Success would suggest it's something to do with the compiler, failure would suggest it's something to do with the syscalls.

If option 1 shows that the pipe is not the problem, I'm going to have to defer to someone who can work with a 64-bit linux system. I just don't have one. If you need help writing the C program, I can help with that.

04/15/10 15:04:50 changed by DRK

I'll see about giving that a shot in the next few days.

04/30/10 11:11:07 changed by schveiguy

  • status changed from assigned to new.
  • owner changed from schveiguy to larsivi.

I no longer am contributing to Tango, please assign to someone else.

05/23/10 06:20:49 changed by wilsonk

A quick note that I tested the above code sample with the following setup and things ran fine (tested about 20 times without problems):

ldc rev.1643 (0.9.2 release, I think!??), tango-0.99.9 (-O3 -release), Ubuntu8.04 x86_64

Thanks, K.Wilson

05/24/10 21:12:47 changed by mrmonday

I can reproduce this using ldc (various versions), tango 0.99.9 (with patch from ldc repo, -O -release), Arch Linux x86-64. Using a working binary from someone else's system works, compiling one myself doesn't. Commenting out the line schveiguy mentioned fixes it, and I don't know how to create an equivalent C/C++ test case for it. Where should I go from here?

06/09/10 21:14:58 changed by mrmonday

For what it's worth, using a pre-compiled version of libtango.a from someone else's working setup works.