libbz2 bindings for Haskell

#3Infinite loop/stack overflow decompressing input that ends prematurely

decompress has a stack overflow when given a bad ByteString. This is caused by bzDecompressChunks running an infinite recursion.

The offending content is a result of compress with the last 6 bytes deleted.

Three possible fixes I can think of are:

  1. Pretend there is no premature input termination and just return whatever has been decoded.

  2. Change the type signature to return Maybe ByteString, returning Nothing on failure. This is similar to what bunzip2 would do.

  3. Change the type signature to (ByteString, ByteString), returning the decompressed ByteString and the unconsumed input ByteString.

I can't say which is the most appropriate way, but Fix 1 can be done easily by changing Unpack.chs, replacing the line "pure (ret, szOut, newBSAp, bs'', keepAlive)" with this code snippet, which also illustrates the nature of the problem:

    -- Processing of current chunk was BzOk (instead of BzStreamEnd) and there is no more
    -- input. This can happen when the input ends prematurely.
    let ret' = if ret == BzOk && null bs'' then BzStreamEnd else ret
    pure (ret', szOut, newBSAp, bs'', keepAlive)

Incidentally, I notice that the result from bZ2BzDecompress is not checked. E.g. there is no special handling for ret of BZ_MEM_ERROR.

The offending content can be constructed as follows:

$ echo QlpoOTFBWSZTWU/rxkYAAAFAAHAAIAAhmBmEYXckU4U= | base64 -d > /tmp/bad.bin