FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

newbie baffled by compress+uncompress misbehavior

 
Post new topic   Reply to topic     Forum Index -> General
View previous topic :: View next topic  
Author Message
Lynn



Joined: 27 Aug 2004
Posts: 89

PostPosted: Sat Aug 28, 2004 2:39 pm    Post subject: newbie baffled by compress+uncompress misbehavior Reply with quote

<alert comment="newbie">

I'm trying to learn how to use zlib. So far, I'm baffled why I can't seem to get a compressed buffer to uncompress consistently. The code below works ok with small buffers, but throws an exception with larger buffers.

The code is similar to Walter B.'s sample code for using zip, except using larger buffers. Eventually, I want to read in a 1.1 meg file that has been zlib compressed from about 4.0 meg. The application will uncompress and proceed. The original uncompressed buffer will be read in from a file, but this sample code just uses arrays to check what happens when a plain text buffer is compressed, and then uncompressed.

Oddly, the same CompressThenUncompress code that works for a buffer of 30 ubytes may fail with 80 ubytes. I suspect that I'm confused about declaring arrays of ubytes.

Am I doing something wrong or leaving out a step or three? The output from running the program is shown at the bottom.

Code:

import std.zlib;
import std.stdio;

void CompressThenUncompress (ubyte[] src)
{
  try {
    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    writef("src.length:  ", src.length, " dst: ", dst.length);
    ubyte[] uncompressedBuf;
    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    writefln(" ... Got past std.zlib.uncompress. dst.length: ", dst.length);
    assert(src.length == uncompressedBuf.length);
    assert(src == uncompressedBuf);
  }
  catch {
    writefln(" ... Exception thrown when src.length = ", src.length, ". Keep going");
  }
}

char[] outerBuf30 =  "000000000011111111112222222222";
char[] outerBuf40 =  "0000000000111111111122222222223333333333";
char[] outerBuf50 =  "00000000001111111111222222222233333333334444444444";
char[] outerBuf100 = "00000000001111111111222222222233333333334444444444"
                     "01234567890123456789012345678901234567890123456789";

void main (char[][] args)
{
  char[] buf32 = "0123456789 0123456789 0123456789";
  CompressThenUncompress(cast(ubyte[])buf32);  // Works ok

  char[] buf40 = "0123456789 0123456789 0123456789 0123456";
  CompressThenUncompress(cast(ubyte[])buf40);  // Works ok

  char[] buf60 = "0123456789 0123456789 0123456789 0123456790 123456789 123456";
  CompressThenUncompress(cast(ubyte[])buf60);  // Throws exception

  ubyte[] ubuf60 = cast(ubyte[])"0123456789 0123456789 0123456789 "
                                "0123456790 123456789 123456";
  CompressThenUncompress(ubuf60);              // Throws exception

  char[] buf80 = "0123456789012345678901234567890123456789"
                 "0123456789012345678901234567890123456789";
  CompressThenUncompress(cast(ubyte[])buf80);  // Throws exception

  CompressThenUncompress(cast(ubyte[])"This string is 28 chars long");    //ok
  CompressThenUncompress(cast(ubyte[])"This string is 42 chars long "
                                      "0123456789012");                   //ok
  CompressThenUncompress(cast(ubyte[])"This string is 46 chars long "
                                      "01234567890123456");               //ok
  CompressThenUncompress(cast(ubyte[])"This string is 60 chars long "
                                      "0123456789012345678901234567890"); //ok
  CompressThenUncompress(cast(ubyte[])"This string is 80 chars long "
                                      "0123456789012345678901234567890"
                                      "12345678901234567890");            //ok

  CompressThenUncompress(cast(ubyte[])outerBuf30);      // ok
  CompressThenUncompress(cast(ubyte[])outerBuf40);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf50);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf100);     // Throws exception
}

// Results from running above code for different array declarations
src.length:  32 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  40 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep going
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep going
src.length:  80 dst: 21 ... Exception thrown when src.length = 80. Keep going
src.length:  28 dst: 34 ... Got past std.zlib.uncompress. dst.length: 34
src.length:  42 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  46 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  60 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  80 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
src.length:  40 dst: 19 ... Exception thrown when src.length = 40. Keep going
src.length:  50 dst: 21 ... Exception thrown when src.length = 50. Keep going
src.length: 100 dst: 33 ... Exception thrown when src.length = 100. Keep going


</alert>
Back to top
View user's profile Send private message
jcc7



Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

PostPosted: Sat Aug 28, 2004 5:07 pm    Post subject: Reply with quote

I haven't really worked with zlib before, but I did notice that the exception is a ZlibException (defined in std.zlib) which is thrown if one of the zlib functions returns an error code. At least one of them was a "buf error" (Z_BUF_ERROR), but others may be different.

This seems like a really odd error to me. I wonder if there's something wrong with the std.zlib module or perhaps it's even a subtle compiler bug.

I played around it by removing the char's and changing the way that the arrays are created. I think whether problems occur may at least partially depend on what's contained in the array. I don't understand why, but one 30-item array works fine and another fails.
Code:
/* based on digitalmars.D:9899 */

import std.zlib;
import std.stdio;


void CompressThenUncompress (ubyte[] src)
{
  try {
    writef("src.len: ", src.length);
    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    writef(" dst: ", dst.length);
    ubyte[] uncompressedBuf;
    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    writefln(" ... Got past std.zlib.uncompress. dst.length: ", dst.length);
    assert(src.length == uncompressedBuf.length);
    assert(src == uncompressedBuf);
  }
  catch(ZlibException) /* from std.zlib */ {
    writefln(" ... ZlibException thrown when src.length = ", src.length, "... ");
  }     
  catch {
    writefln(" ... Exception thrown when src.length = ", src.length, ". Keep going");
  }
}


void main (char[][] args)
{
  static ubyte[] buf10   = [48, 48, 48, 48, 48, 48, 48, 48, 48, 48];
  static ubyte[] buf30ok = [48, 48, 48, 48, 48, 48, 48, 48, 48, 48,
    49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
    50, 50, 50, 50, 50, 50, 50, 50, 50, 50];

  ubyte[] buf30ng;

  buf30ng ~= buf10 ~ buf10 ~ buf10;

  CompressThenUncompress(cast(ubyte[])buf30ok);  /* ok */
  CompressThenUncompress(cast(ubyte[])buf30ng);  /* exception */
}
Code:
src.len: 30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
src.len: 30 dst: 11 ... ZlibException thrown when src.length = 30...
Back to top
View user's profile Send private message AIM Address
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Sat Aug 28, 2004 10:06 pm    Post subject: Reply with quote

Weird... I just gave std.zlib a quick look and it's doing something I didn't think was possible--uncompressing memory buffers entirely using API functions. I've done that myself but I had to write special code to handle the zip header, which I don't see here. I'll try and dig into it further, but this has me wondering if std.zlib is doing everything it should be doing. Assuming I'm right, compress should work fine but uncompress may be broken.
Back to top
View user's profile Send private message
jcc7



Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

PostPosted: Sat Aug 28, 2004 10:59 pm    Post subject: Reply with quote

sean wrote:
Weird... I just gave std.zlib a quick look and it's doing something I didn't think was possible--uncompressing memory buffers entirely using API functions.
I don't think they're all system API functions -- it uses [http://www.gzip.org/zlib/ zlib] functions.

sean wrote:
I've done that myself but I had to write special code to handle the zip header, which I don't see here. I'll try and dig into it further, but this has me wondering if std.zlib is doing everything it should be doing. Assuming I'm right, compress should work fine but uncompress may be broken.

Right. I think Ben Hinkle found out the problem in the newsgroup (lines 139-140 of std.zlib):
Code:
    if (!destlen)
   destlen = srcbuf.length * 2 + 1;
If I'm reading this right, it appears to run out of buffer space if the compression is better than 50?. Or something like that.
Back to top
View user's profile Send private message AIM Address
jcc7



Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

PostPosted: Sun Aug 29, 2004 1:10 am    Post subject: Reply with quote

Since I managed to rebuild phobs, I can confirm Ben's results. When the buffer length is increased, the tests do pass. I think 2 lines are affected: line 140 and line 397. I'm still thinking that it's
Code:
destlen = srcbuf.length * compression_factor + fudge_factor;
but it's a hunch.

My guess is that the fudge_factor is only zero or one, but Ben could be right and it might need to be 100 or more. When I put
Code:
const int fudge_factor = 1;
const int compression_factor = 10;
near the top of the file, the zlib tests passed for me.
Back to top
View user's profile Send private message AIM Address
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Sun Aug 29, 2004 1:43 pm    Post subject: Reply with quote

jcc7 wrote:
sean wrote:
Weird... I just gave std.zlib a quick look and it's doing something I didn't think was possible--uncompressing memory buffers entirely using API functions.
I don't think they're all system API functions -- it uses [http://www.gzip.org/zlib/ zlib] functions.

That's what I meant. For some odd reason, the API zlib ships with has two sets of functions: the core API functions and then some printf-like functions for file i/o. The printf-like functions will generate/parse a zip header, but the core API functions do not. The header really only matters if you plan to generate or read zip files that will be read by another program, but that's not an uncommon case.

jcc7 wrote:

sean wrote:
I've done that myself but I had to write special code to handle the zip header, which I don't see here. I'll try and dig into it further, but this has me wondering if std.zlib is doing everything it should be doing. Assuming I'm right, compress should work fine but uncompress may be broken.

Right. I think Ben Hinkle found out the problem in the newsgroup (lines 139-140 of std.zlib):
Code:
    if (!destlen)
   destlen = srcbuf.length * 2 + 1;
If I'm reading this right, it appears to run out of buffer space if the compression is better than 50?. Or something like that.

Yup. Typical behavior is to loop on inflate until all the data has been extracted, but it looks like std.zlib isn't doing that. This should probably be fixed. I posted a response to Ben's post about this.
Back to top
View user's profile Send private message
jcc7



Joined: 22 Feb 2004
Posts: 657
Location: Muskogee, OK, USA

PostPosted: Sun Aug 29, 2004 3:34 pm    Post subject: Reply with quote

sean wrote:
Yup. Typical behavior is to loop on inflate until all the data has been extracted, but it looks like std.zlib isn't doing that. This should probably be fixed. I posted a response to Ben's post about this.
The iterative process that you describe makes sense. I wonder what the optimum buffer size would be (Walter seemed to think it was twice the compressed size).
Back to top
View user's profile Send private message AIM Address
Lynn



Joined: 27 Aug 2004
Posts: 89

PostPosted: Mon Aug 30, 2004 5:26 am    Post subject: Reply with quote

I've posted as a std.zlib.decompress bug, and appreciate Sean K's offer to fix.
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/1677
Back to top
View user's profile Send private message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Mon Aug 30, 2004 3:06 pm    Post subject: Reply with quote

jcc7 wrote:
The iterative process that you describe makes sense. I wonder what the optimum buffer size would be (Walter seemed to think it was twice the compressed size).

That's a good starting number, though my experience with zlib is that it returns data in bursts, even with Z_SYNC_FLUSH set. My filter class grows the buffer as needed to insure that all the read data is processed. For the performance test I did, the read buffer was 1024 bytes and the output buffer began at 1024 bytes but had grown to 8192 bytes by the end of the run.
Back to top
View user's profile Send private message
Dave



Joined: 31 Aug 2004
Posts: 1

PostPosted: Tue Aug 31, 2004 8:23 pm    Post subject: Reply with quote

I think I found the problem. Please refer to:

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/1717

- Dave
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> General All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group