#3: Infinite loop/stack overflow decompressing input that ends prematurely

libbz2 bindings for Haskell

#3Infinite loop/stack overflow decompressing input that ends prematurely

decompress has a stack overflow when given a bad ByteString. This is caused by bzDecompressChunks running an infinite recursion.

The offending content is a result of compress with the last 6 bytes deleted.

Three possible fixes I can think of are:

Pretend there is no premature input termination and just return whatever has been decoded.
Change the type signature to return Maybe ByteString, returning Nothing on failure. This is similar to what bunzip2 would do.
Change the type signature to (ByteString, ByteString), returning the decompressed ByteString and the unconsumed input ByteString.

I can't say which is the most appropriate way, but Fix 1 can be done easily by changing Unpack.chs, replacing the line "pure (ret, szOut, newBSAp, bs'', keepAlive)" with this code snippet, which also illustrates the nature of the problem:

    -- Processing of current chunk was BzOk (instead of BzStreamEnd) and there is no more
    -- input. This can happen when the input ends prematurely.
    let ret&#39; = if ret == BzOk && null bs&#39;&#39; then BzStreamEnd else ret
    pure (ret&#39;, szOut, newBSAp, bs&#39;&#39;, keepAlive)

Incidentally, I notice that the result from bZ2BzDecompress is not checked. E.g. there is no special handling for ret of BZ_MEM_ERROR.

The offending content can be constructed as follows:

$ echo QlpoOTFBWSZTWU/rxkYAAAFAAHAAIAAhmBmEYXckU4U= | base64 -d > /tmp/bad.bin

jchia Fri May 8 17:56:19 UTC 2020
- description updated
jchia Fri May 8 18:01:04 UTC 2020
- description updated
jchia Fri May 8 18:03:11 UTC 2020
- description updated
jchia Fri May 8 18:04:38 UTC 2020
- description updated
Vanessa McHalde Fri May 8 18:52:06 UTC 2020
Thanks! I've pushed to hackage, hopefully it'll have a test case soon as well.
jchia Sat May 9 04:16:18 UTC 2020
Thanks. Would you mind changing decompress or making a variant thereof that does not call error, along the lines of Approach 2 or Approach 3 that I mentioned? I can try to contribute code for this but know nothing about the darcs counterpart of a github PR.
jchia Tue May 12 11:13:39 UTC 2020
Hi, any updates? I don't think it's right for a pure function to just call error for a situation that can be expected and detected.
Vanessa McHalde Tue May 12 15:06:14 UTC 2020
Something like Maybe BS.ByteString? I can do that!
jchia Wed May 13 00:25:22 UTC 2020
Yes, "Maybe BS.ByteString" would be much better than error. "Either x BS.ByteString" with x being the type for an error code would be helpful for telling the caller why failure occurred, but probably cost more development than "Maybe BS.ByteString".

Thanks.

hub.darcs.net :: vmchale -> bz2 -> issue

#3Infinite loop/stack overflow decompressing input that ends prematurely