Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #873 (closed defect: invalid)

Opened 8 months ago

Last modified 1 month ago

GC corrupts initializer data

Reported by: jascha Assigned to: sean
Priority: critical Milestone: 0.99.8
Component: Core Functionality Version: 0.99.4 Frank
Keywords: triage Cc: jascha

Description

Attached is a programm that produces the error. Phobos has the same problem. To reproduce it, do the following:

  • you need ddoc.d and ddoc.txt in the same directory
  • dmd -g -debug -debug=parser -J. ddoc.d
  • run ddoc.exe
  • it will abort with an "Array index out of bounds"

If the GC is disabled (uncomment line 41), the problem is gone. The program does not terminate, but that's "correct".

Here is some commented info about the state of the program after the exception from ddbg:

#0 ?? () at ddoc.d:2226 from KERNEL32.dll
#1 0x00411f24 in __d_throw@4 () at ddoc.d:2226 from deh
#2 0x00407081 in  ddoc.GLRParser.ruleToString () at ddoc.d:2226
#3 0x0040a94c in  ddoc.MainGrammar.parse () at ddoc.d:2663
#4 0x00406d0b in  ddoc.GLRParser.parse () at ddoc.d:2153
#5 0x00405bb9 in  ddoc.parse () at ddoc.d:1971
#6 0x00402083 in _Dmain (args = {
  [0] = "ddoc.exe"
}) at ddoc.d:1966
#7 0x00410fac in extern (C) int dmain2.main(int, char**) . void runMain(void*) () from dmain2
#8 0x00410fe3 in extern (C) int dmain2.main(int, char**) . void runAll(void*) () from dmain2
#9 0x00410db4 in _main () from dmain2
#10 0x0041893d in _mainCRTStartup () from constart
#11 0x767419f1 in ?? () from KERNEL32.dll
#12 0x77e4d109 in ?? () from ntdll.dll

line 2226 is:
name = nt_names[s-FIRST_NT];

FIRST_NT is constant 0x1000

the variable values are:
->= this.nt_names.length
0x00000016
->= s
0x00465be0

obviously this crashes, but the s comes from the foreach loop in line 2219. relevant values:
->= ri.symbols
{
  [0] = 0x00465be0
}
ri is passed by the caller in line 2664 as &rule_infos[action]

->f 3
Current frame level is 3
->= action
0x00000015
->= this.rule_infos[0x15].symbols
{
  [0] = 0x00465be0
}

obviously this is where the value comes from, *but* rule_infos is
only initialized in line 2448 as an array of RuleInfo structs (declared in line 1920) and never written to. as one can see from the code (or by setting a breakpoint just after the initialization), the correct value is
->= this.rule_infos[0x15].symbols
{
  [0] = 0x0000000a
}

that memory gets corrupted somewhere along the way...

Attachments

ddoc.d (93.8 kB) - added by jascha on 02/01/08 21:23:07.
ddoc.txt (1.4 kB) - added by jascha on 02/01/08 21:24:39.

Change History

02/01/08 21:23:07 changed by jascha

  • attachment ddoc.d added.

02/01/08 21:24:39 changed by jascha

  • attachment ddoc.txt added.

02/01/08 21:25:13 changed by jascha

  • cc set to jascha.

02/17/08 11:15:38 changed by jascha

  • priority changed from major to critical.

narrowed it down with a hardware breakpoint. it's tango/lib/gc/basic/gcx.d line 2434 apparently a free list is built in that memory.

02/23/08 00:05:20 changed by kris

  • milestone set to 0.99.6.

04/17/08 01:32:00 changed by sean

  • status changed from new to assigned.

gcx.d has changed a bit since this bug was filed. I don't suppose you could point me at the proper line for the current revision?

04/27/08 05:08:30 changed by larsivi

  • milestone changed from 0.99.6 to 0.99.7.

05/11/08 17:32:28 changed by larsivi

  • keywords set to triage.

06/21/08 21:53:17 changed by sean

I don't suppose you can trim the sample down a bit? It's immense.

06/23/08 15:39:30 changed by jascha

No, sorry. This is actually the smallest testcase i've come across, so far. I haven't been able to narrow down the problem with reasonable effort, yet.

07/10/08 07:05:04 changed by larsivi

  • milestone changed from 0.99.7 to 0.99.8.

09/09/08 15:16:09 changed by sean

My first guess would be that your app is overwriting memory somewhere, possibly by iterating past the end of an array or using an uninitialized pointer. It's possible that including ddoc.txt, etc, simply changes the memory layout of the program in such a way that the overwriting occurs somewhere visible. I've never tried it, but the GC has a "sentinel" mode that's intended to trap such errors. I'll give it a shot and see what happens.

09/09/08 16:29:33 changed by sean

  • status changed from assigned to closed.
  • resolution set to invalid.

When I turned on debug mode in the GC the program ran for a lot longer before crashing, which indicates to me that this is an overrun bug somewhere in the program rather than a GC issue. I'm going to mark this as "not a bug" with Tango. If you can reduce the code and demonstrate otherwise, please reopen.