libbz2 bindings for Haskell
#3Infinite loop/stack overflow decompressing input that ends prematurely
decompress has a stack overflow when given a bad ByteString. This is caused by bzDecompressChunks running an infinite recursion.
The offending content is a result of compress with the last 6 bytes deleted.
Three possible fixes I can think of are:
Pretend there is no premature input termination and just return whatever has been decoded.
Change the type signature to return Maybe ByteString, returning Nothing on failure. This is similar to what bunzip2 would do.
Change the type signature to (ByteString, ByteString), returning the decompressed ByteString and the unconsumed input ByteString.
I can't say which is the most appropriate way, but Fix 1 can be done easily by changing Unpack.chs, replacing the line "pure (ret, szOut, newBSAp, bs'', keepAlive)" with this code snippet, which also illustrates the nature of the problem:
-- Processing of current chunk was BzOk (instead of BzStreamEnd) and there is no more
-- input. This can happen when the input ends prematurely.
let ret' = if ret == BzOk && null bs'' then BzStreamEnd else ret
pure (ret', szOut, newBSAp, bs'', keepAlive)
Incidentally, I notice that the result from bZ2BzDecompress is not checked. E.g. there is no special handling for ret of BZ_MEM_ERROR.
The offending content can be constructed as follows:
$ echo QlpoOTFBWSZTWU/rxkYAAAFAAHAAIAAhmBmEYXckU4U= | base64 -d > /tmp/bad.bin
- description updated
- description updated
- description updated
- description updated
Thanks! I've pushed to hackage, hopefully it'll have a test case soon as well.
Thanks. Would you mind changing decompress or making a variant thereof that does not call error, along the lines of Approach 2 or Approach 3 that I mentioned? I can try to contribute code for this but know nothing about the darcs counterpart of a github PR.
Hi, any updates? I don't think it's right for a pure function to just call error for a situation that can be expected and detected.
Something like Maybe BS.ByteString? I can do that!
Yes, "Maybe BS.ByteString" would be much better than error. "Either x BS.ByteString" with x being the type for an error code would be helpful for telling the caller why failure occurred, but probably cost more development than "Maybe BS.ByteString".
Thanks.