As anyone who's been following The Nats Blog recently knows, there's been a lot of talk about who will play this year. But, ostensibly unsatisfied with so simple of a puzzle, I've also heard people arguing about what the optimal lineup should be. I know because I was one of them.
My simulator relies on Markov chains which are not as complicated in practice as Wikipedia will lead you to believe.
The general idea is this: assume each player has some inherent ability to hit singles, doubles, triples, home runs, to make outs, to walk, and so on. With this assumption, when a player comes to the plate, we can guess the probabilities with which each of these events (hitting a single, double, etc.) will occur. We also know for a fact how each of these events will change the current state (you know what happens when someone walks). This means we can assign probabilities to the state that the game will be in when the current player finishes his plate appearance.
To make that a little less abstract, let's say a guy gets up with a man on first and no outs. Since we're simplifying things, let's also say that the batter has a 25% chance of hitting a short single and a 75% chance of striking out. Therefore, after his plate appearance is over, there is a 25% chance there will be men on second and first with no outs and a 75% there will be only a man on first with one out when the next player arrives at the plate. Using just these simple ideas and adding more events, we can simulate entire seasons of baseball in seconds!
But even with a method to simulate how many runs a given lineup will score, the task of finding the optimal lineup is still beyond my reach. Being not so skilled with these types of problems, the only method I can think of for finding the solution in a situation like this is to use brute force—i.e. checking every single lineup—a method which would take my computer over 100 days (this is because with nine men there are 9!=362,880 possible lineups to check).
So, instead, I looked up some potential lineups: by asking people for them, by looking them up on blogs, and by concocting a couple myself. Here are the results:
Random: 755 Runs (Z, M, W, L, D, E, Catch, COF, P)
By wOBA: 752 Runs (Z, W, L, COF, E, D, Catch, M, P)
Ted: 707 Runs (M, COF, W, L, Z, D, Catch, E, P)
rotochamp/Will: 705 Runs (M, D, Z, W, L, COF, E, Catch, P)
Andrew: 702 Runs (M, COF, Z, W, L, D, E, C, P)
Examiner: 691 Runs (M, D, Z, L, W, COF, Catch, E, P)
Bryce: 681 Runs (COF, E, Z, W, L, D, Catch, M, P)
Reverse wOBA: 671 Runs (P, M, Catch, D, E, COF, L, W, Z)
A few notes:
- ·The only possible choices were Nyjer Morgan, Ryan Zimmerman, Jayson Werth, Adam LaRoche, Ian Desmond, Danny Espinosa, Catcher (A combination of Jesus Flores, Ivan Rodgriguez, and Wilson Ramos), Corner OF (A combination of Mike Morse and Roger Bernadina), and a Pitcher (League average rates for pitchers). Hence the above table, where Z stands for Zimmerman, M for Morgan, and so on.
- ·The probabilities of each play were estimated from past performance and Bill James projections.
- ·There were only six possible plays: singles, doubles, triples, home runs, walks, and single outs which advanced no runners. This likely lead to some optimistic 162 game run totals (no double plays) as did the assumption that guys would not get injured and other unrealistic assumptions I'm sure you can imagine. I was, however, pretty happy with how reasonable the run estimates were.
- ·There is an “error rate” of about +/- nine runs for these totals, so the “random” order is about the same as the high-wOBA-first lineup but both are vastly superior to the other arrangements. Ordering by lowest wOBA first led to the worst lineup.
- ·The more conventional lineups lead to about the same number of runs (700) while the highest-wOBA first lineup and the random lineup (which is awfully similar to the highest-wOBA lineup) lead to significantly more runs (750). The difference of 50 runs is often thought to be worth about five wins (10 runs per win) meaning that attempting to optimize the lineup is indeed worthwhile.
- ·I'm not really surprised that ordering by wOBA led to an optimal lineup since wOBA is itself derived from a Markov approach. In fact, I'm pretty confident you can prove this is the optimal lineup (and that the worst lineup is the reverse-wOBA lineup) if wOBA works like I think it does (which cause me to revise my earlier statement that I don't know how to find the optimal lineup). What I am somewhat surprised by is the large difference between the more traditional lineups and the wOBA lineup because I have seen several studies suggesting that different arrangements of lineups don't seem to matter that much. Perhaps this difference would be mitigated by a more accurate simulator.
So … the takeaway: Different lineups can have a major impact on how many runs are scored! And for a team like the Nationals, five wins can make a major difference.