#4: Applicative instances for StateT (possibly others) don't optimize well

concrete functor and monad transformers

#4Applicative instances for StateT (possibly others) don't optimize well

We were looking at replacing the definition of mapM on lists with the default traverse. This caused a significant regression in the cryptarithm2 benchmark in nofib, however. I finally tracked down the culprit to the Applicative instance for StateT, which is used in the benchmark. Specifically, it has:

(<*>) = ap

For some reason, this does not properly interact with list fusion. I suspect it's because ap inlines first, and then (<*>) is too large to inline. So traverse ends up doing far more allocation than mapM, which happens to optimize better. Marking (<*>) inline equalizes the performance.

This likely affects other transformers over lists as well.

I'm uploading a test case that shows mapM being about an order of magnitude better on allocation than traverse.

Dan Doel Mon Jun 8 04:05:07 UTC 2015

Here's the test case:

module Main (main) where

import Control.Monad.Trans.State
import Control.Monad.Trans.Class
import System.Environment

consume [] = ()
consume (_:xs) = consume xs

f _ = lift [1..8]

main = getArgs >>= \(m:_) ->
  print . consume . flip evalStateT "" $
    case read m of
      1 -> mapM f [1..8]
      2 -> traverse f [1..8]

Ross Paterson Tue Jun 9 19:12:45 UTC 2015
I can reproduce the difference, but not the fix. Looks like some fun reading Core is required.
Dan Doel Mon Jun 15 00:08:35 UTC 2015
Strange. I just set everything up with cabal and a sandbox, and got the reported results. Are you sure you didn't accidentally modify the .Strict file instead of .Lazy or something?

If you want an hint when looking at the core (if you do so), the culprit in another test was an unfused call to map for the traverse version. I haven't looked at the core in my above test case, but I presume it's there as well.

(Sorry about the delay responding. I forgot that hub doesn't e-mail you about this stuff.)
Dan Doel Mon Jun 15 00:53:37 UTC 2015
Never mind. I've reproduced your inability to reproduce the fix. It only works on a HEAD build I installed most recently, not with 7.10.1. My guess would be an inlining difference between the two, but I haven't figured out what the culprit is yet. I'll let you know if I do.
Dan Doel Mon Jun 15 01:20:41 UTC 2015
Well, I've tried about everything I can think of, and only one thing works: a custom implementation of (<*>) marked inline. Something like:
```
mf <*> mx = StateT $ \s ->
  runStateT mf s >>= \ ~(f, s&#39;) ->
    fmap (\ ~(x, s&#39;&#39;) -> (f x, s&#39;&#39;)) $ runStateT mx s&#39;
{-# INLINE (<*>) #-}
```
I've tried eta expanding the call to ap, importing GHC.Exts and using (<*>) = inline ap, tweaking fmap (which shouldn't be relevant since the custom (<*>) works), and none of those mattered.

Maybe there's something that will work, but it might not be worth the effort of figuring it out. I don't know.

FYI, I tested, and .Strict has the same performance differences as .Lazy. So I think it's quite likely that other transformers are affected as well.
Ross Paterson Sun Jun 21 14:20:51 UTC 2015
- status set to closed
All Applicative and Alternative methods are now expanded, but only only (<*>) in the two StateT variants is explicitly INLINEd at present.

hub.darcs.net :: ross -> transformers -> issue

#4Applicative instances for StateT (possibly others) don't optimize well