concrete functor and monad transformers
#4Applicative instances for StateT (possibly others) don't optimize well
We were looking at replacing the definition of mapM
on lists with the default traverse
. This caused a significant regression in the cryptarithm2 benchmark in nofib, however. I finally tracked down the culprit to the Applicative
instance for StateT
, which is used in the benchmark. Specifically, it has:
(<*>) = ap
For some reason, this does not properly interact with list fusion. I suspect it's because ap
inlines first, and then (<*>)
is too large to inline. So traverse
ends up doing far more allocation than mapM
, which happens to optimize better. Marking (<*>)
inline equalizes the performance.
This likely affects other transformers over lists as well.
I'm uploading a test case that shows mapM
being about an order of magnitude better on allocation than traverse
.
Here's the test case:
module Main (main) where import Control.Monad.Trans.State import Control.Monad.Trans.Class import System.Environment consume [] = () consume (_:xs) = consume xs f _ = lift [1..8] main = getArgs >>= \(m:_) -> print . consume . flip evalStateT "" $ case read m of 1 -> mapM f [1..8] 2 -> traverse f [1..8]
I can reproduce the difference, but not the fix. Looks like some fun reading Core is required.
Strange. I just set everything up with cabal and a sandbox, and got the reported results. Are you sure you didn't accidentally modify the
.Strict
file instead of.Lazy
or something?If you want an hint when looking at the core (if you do so), the culprit in another test was an unfused call to
map
for thetraverse
version. I haven't looked at the core in my above test case, but I presume it's there as well.(Sorry about the delay responding. I forgot that hub doesn't e-mail you about this stuff.)
Never mind. I've reproduced your inability to reproduce the fix. It only works on a HEAD build I installed most recently, not with 7.10.1. My guess would be an inlining difference between the two, but I haven't figured out what the culprit is yet. I'll let you know if I do.
Well, I've tried about everything I can think of, and only one thing works: a custom implementation of
(<*>)
marked inline. Something like:mf <*> mx = StateT $ \s -> runStateT mf s >>= \ ~(f, s') -> fmap (\ ~(x, s'') -> (f x, s'')) $ runStateT mx s' {-# INLINE (<*>) #-}
I've tried eta expanding the call to
ap
, importingGHC.Exts
and using(<*>) = inline ap
, tweakingfmap
(which shouldn't be relevant since the custom(<*>)
works), and none of those mattered.Maybe there's something that will work, but it might not be worth the effort of figuring it out. I don't know.
FYI, I tested, and
.Strict
has the same performance differences as.Lazy
. So I think it's quite likely that other transformers are affected as well.- status set to closed
All Applicative and Alternative methods are now expanded, but only only (<*>) in the two StateT variants is explicitly INLINEd at present.