concrete functor and monad transformers

#4Applicative instances for StateT (possibly others) don't optimize well

We were looking at replacing the definition of mapM on lists with the default traverse. This caused a significant regression in the cryptarithm2 benchmark in nofib, however. I finally tracked down the culprit to the Applicative instance for StateT, which is used in the benchmark. Specifically, it has:

(<*>) = ap

For some reason, this does not properly interact with list fusion. I suspect it's because ap inlines first, and then (<*>) is too large to inline. So traverse ends up doing far more allocation than mapM, which happens to optimize better. Marking (<*>) inline equalizes the performance.

This likely affects other transformers over lists as well.

I'm uploading a test case that shows mapM being about an order of magnitude better on allocation than traverse.

  • Here's the test case:

    module Main (main) where
    
    import Control.Monad.Trans.State
    import Control.Monad.Trans.Class
    import System.Environment
    
    consume [] = ()
    consume (_:xs) = consume xs
    
    f _ = lift [1..8]
    
    main = getArgs >>= \(m:_) ->
      print . consume . flip evalStateT "" $
        case read m of
          1 -> mapM f [1..8]
          2 -> traverse f [1..8]
  • I can reproduce the difference, but not the fix. Looks like some fun reading Core is required.

  • Strange. I just set everything up with cabal and a sandbox, and got the reported results. Are you sure you didn't accidentally modify the .Strict file instead of .Lazy or something?

    If you want an hint when looking at the core (if you do so), the culprit in another test was an unfused call to map for the traverse version. I haven't looked at the core in my above test case, but I presume it's there as well.

    (Sorry about the delay responding. I forgot that hub doesn't e-mail you about this stuff.)

  • Never mind. I've reproduced your inability to reproduce the fix. It only works on a HEAD build I installed most recently, not with 7.10.1. My guess would be an inlining difference between the two, but I haven't figured out what the culprit is yet. I'll let you know if I do.

  • Well, I've tried about everything I can think of, and only one thing works: a custom implementation of (<*>) marked inline. Something like:

    mf <*> mx = StateT $ \s ->
      runStateT mf s >>= \ ~(f, s&#39;) ->
        fmap (\ ~(x, s&#39;&#39;) -> (f x, s&#39;&#39;)) $ runStateT mx s&#39;
    {-# INLINE (<*>) #-}

    I've tried eta expanding the call to ap, importing GHC.Exts and using (<*>) = inline ap, tweaking fmap (which shouldn't be relevant since the custom (<*>) works), and none of those mattered.

    Maybe there's something that will work, but it might not be worth the effort of figuring it out. I don't know.

    FYI, I tested, and .Strict has the same performance differences as .Lazy. So I think it's quite likely that other transformers are affected as well.

    • status set to closed

    All Applicative and Alternative methods are now expanded, but only only (<*>) in the two StateT variants is explicitly INLINEd at present.