Originally written on Fangraphs  |  Last updated 11/15/14
In my first article, I wrote about the limitations of the linear weights system that wOBA is based on when it comes to the context of unusual team offenses. In my second, I explained how Tom Tango, wOBA’s creator, also came up with a way of addressing some of these limitations by deriving a new set of linear weights for different run environments, thanks to BaseRuns. Today, I will tell you about the next step in the evolution of run estimators — the Markov model. Tom Tango created such a model that can be accessed through his website, and I’ve turned that model into a spreadsheet that I’ll share with you here. I’ve told you that the problem with the standard run estimator formulas is that they make assumptions about what a hit is going to be worth, run-wise, based on what it was worth to an average team. That means it’s not going to apply very well to an unusual team. What’s so great about the Markov is that it makes no such assumptions — it figures all of that out itself, specific to each team. And when I say it figures it out, I mean it basically calculates out a typical game for that team, given the proportion of singles, walks, home runs, etc. the team gets in its plate appearances. It therefore estimates the run-scoring of typical teams better than just about anything, but it also theoretically should apply much, much better to very unusual or even made-up teams. Will this spreadsheet thing make my life complete? Well, not really. But it is fun to explore. The thing I think it’s most useful for is to guess how many runs a team would score with or without certain players. To demonstrate why this may be eye-opening for you, I’m going to show you how even two players with identical wOBA and wRC+ ratings could have significantly different offensive values to different teams.Markov: I must break you…r perceptions of player values In 2011, Mark Trumbo and Alberto Callaspo had identical wOBAs (0.328) and therefore identical wRC+ as well (108), seeing as how they both played for the Angels. However, they achieved these above-average wOBAs in very different ways: Callaspo with a 0.366 OBP and 0.375 SLG, and Trumbo with a 0.291 OBP and 0.477 SLG. So, let’s place these two onto various teams to see what happens. To keep things simple, let’s just pretend there’s no such thing as park effects. Now, before I get into this, let me remind you that teams don’t have a fixed number of plate appearances per season, but their number of outs in a season is close to fixed; e.g. 162 games/season * 9 innings/game * 3 outs/inning = 4374 outs. Of course, it’s not exactly that, mainly because of extra innings and the fact that the home team won’t have a full 9 innings of offense in games they win. Anyway, I’m going to try to equalize Trumbo and Callaspo for playing time by giving them the same number of outs, defined as: Outs = PA – H – BB – HBP + CS + GDP. Ideally, that would also add outs on the bases as well, but FanGraphs doesn’t provide that as of yet. Another thing: I really ought to be removing a player from each of these teams to make room for Trumbo or Callaspo, but so as not to add the additional variable of different players being removed from different teams, we’ll just reduce each team’s outs (and the rest of their numbers proportionally) to make room. This means we’re basically just pretending that all the original players on that team had their playing time reduced a bit to make room. So, without further ado, here’s what happens when 2011 Trumbo’s (T) or Callaspo’s (C) numbers are inserted into various especially good or bad offenses: Season Team or Player OBP SLG Aggro Actual Markov (tweaked) Markov (default) BaseRuns Runs Created 2011 Mark Trumbo 0.291 0.477 -0.193 ? 4.440 4.765 4.828 5.066 2011 Alberto Callaspo 0.366 0.375 -0.043 ? 4.988 5.211 5.125 5.219 1963 Colt .45′s 0.283 0.301 0.190 2.864 2.837 2.774 2.921 2.959 1963 Colt .45′s+T 0.284 0.318 0.154 ? 2.997 2.975 3.115 3.156 1963 Colt .45′s+C 0.292 0.308 0.165 ? 3.023 2.978 3.114 3.162 1965 Mets 0.277 0.327 0.119 3.018 2.956 2.968 3.121 3.153 1965 Mets+T 0.278 0.342 0.089 ? 3.187 3.144 3.289 3.327 1965 Mets+C 0.286 0.332 0.105 ? 3.215 3.145 3.292 3.343 1968 Mets 0.281 0.315 0.238 2.902 2.945 2.850 3.035 3.110 1968 Mets+T 0.282 0.331 0.199 ? 3.094 3.040 3.214 3.289 1968 Mets+C 0.290 0.321 0.208 ? 3.120 3.042 3.216 3.300 2011 Mariners 0.292 0.348 0.195 3.432 3.454 3.385 3.538 3.608 2011 Mariners+T 0.292 0.361 0.159 ? 3.554 3.525 3.670 3.749 2011 Mariners+C 0.300 0.351 0.171 ? 3.590 3.537 3.681 3.763 1994 Yankees 0.374 0.462 -0.283 5.929 5.904 6.516 6.404 6.630 1994 Yankees+T 0.364 0.464 -0.271 ? 5.663 6.227 6.163 6.427 1994 Yankees+C 0.373 0.450 -0.246 ? 5.774 6.331 6.223 6.423 1996 Mariners 0.366 0.484 -0.197 6.168 6.098 6.526 6.452 6.765 1996 Mariners+T 0.360 0.483 -0.196 ? 5.911 6.328 6.279 6.602 1996 Mariners+C 0.366 0.473 -0.178 ? 5.989 6.397 6.323 6.607 1999 Indians 0.373 0.467 -0.161 6.228 6.119 6.547 6.454 6.688 1999 Indians+T 0.366 0.468 -0.162 ? 5.925 6.340 6.279 6.538 1999 Indians+C 0.373 0.457 -0.148 ? 6.006 6.414 6.321 6.535 A bit more explanation: besides the default version of the Markov that Tango has on his site, as well as the simple versions of BaseRuns and Bill James’ Runs Created that the webpage also produces, I’ve listed the results for a slightly altered version of the Markov that I came up with, which attempts to account for certain factors that are missing from the Markov (I’ll talk more about this later). The “aggro” factor is my stab at measuring base running aggression and effectiveness that I use in the tweaked Markov. So, at the top two spots on the list, we have the theoretical runs scored of teams full of clones of either Trumbo or Callaspo. This is basically the same idea as the RC27 you can find amongst ESPN.com’s sabermetric stats (which places Trumbo at 4.47 and Callaspo at 5.22, by the way). You can see right away that the Markovs favor Callaspo over Trumbo more than you might expect from their wOBAs and wRC+. Do you remember seeing the exponential growth curve of runs depending on team OBP in my last article? That explains why this is the case — it’s an important team effect that wOBA doesn’t try to account for. You’ll also notice that relative to Trumbo, Callaspo is worth a lot more to the good offenses than to the bad ones. In particular he’s worth more to the high-OBP teams, as besides the exponential impact his better OBP has on runs, his relative lack of power hurts less. That’s because the value of a single to a high-OBP team is greater than it is to a low-OBP team, especially relative to a HR (see the graphs in my second article if that confuses you). There is a threshold of team suckitude at which 2011 Trumbo’s offense would become more valuable to a team than 2011 Callaspo’s, but it appears that even a bad team in the deadball era of the 60s is still a little bit short of that. Play along at home or work I took a page out of Bradley Woodrum’s book and I’m giving you a peek via the Excel Web App. Just click on the green Excel icon in the bottom right area of the app to download the spreadsheet (about 1 MB in size). Once you’ve downloaded it, you’ll be able to paste data from the Standard section of team batting numbers from FanGraphs (link) into the “Enter Data Here” tab of my spreadsheet, or enter whatever you want manually. You’ll then be able to see the results of the calculations on the “Results” tab (surprise), which you should be able to find near the bottom of the spreadsheet. Here ya go: The Perfect Run Modeler? Almost. Tom Tango says his model is “mathematically perfect,” but readily acknowledges that it’s a bit simplistic, ignoring not only steals (SB) and caught stealing (CS), but grounded into double plays (GIDP) and other outs on bases (OOB). To properly account for these factors would require a much more complicated model, but I’ve come up with some modifications that attempt to account for those factors, without fundamentally changing Tango’s model. The first thing I did was to reduce each team’s expected plate appearances per game by their expected GIDP and CS per game, along with an empirically-derived OOB constant tied to their on base rates. It’s not a perfect solution, because, for one, OOB rates aren’t so constant, as James Gentile recently pointed out at THT. You can, however, get OOB data from Baseball-Reference.com, if you have the patience and the desire. Another issue (I think) is that GIDP rates are dependent on how likely it is for a batter to have men on base, which would mean, for example, that I shouldn’t be penalizing a team full of 9 Trumbos so much for GIDP, because that team would be less likely to be able to hit into one. That could be worked out better, but it’s tricky. The other main thing I did was to create the aforementioned base running aggressiveness modifier to the extra-base-taking rates that are essential to the model (they’re really the main assumptions in the model that are a bit tricky to estimate). It’s based on things like steals and caught stealing per runner on 1B, as well as 3B/2B. It’s probably not so proper that I’ve also included GIDP/PA as a major factor here, but the last trick I did didn’t fully account for the negative impact of GIDPs. I also included team OBP and SLG as factors, as one can expect weaker teams to be more aggressive on the base paths due to low odds of scoring without taking extra bases. Finally, I changed the default extra-base-taking rates to be more in-line with Tango’s empirical findings. Of course, those rates aren’t entirely stable. Feel free to change anything in the “Results” tab that is bordered in red, as you see fit. You can even mess around with the “Calculations” tab if you know your stuff. Well, that’s my time. Hope you’ve enjoyed. There’s plenty more I can say about this subject, if you’re interested — let me hear your questions and comments, and if you’d like to see me apply this to something else or make changes.
GET THE YARDBARKER APP:
Ios_download En_app_rgb_wo_45
MORE FROM YARDBARKER

John Calipari blasted for postgame interview

WATCH: MJ, Tom Brady play pickup game in Bahamas

WATCH: Wrong national anthem played for El Salvador

WATCH: Russell Wilson smashes HR in Rangers BP

John Fox: Jay Cutler will have to earn starting QB job

LIKE WHAT YOU SEE?
GET THE DAILY NEWSLETTER:

Report: Shaka Smart, Gregg Marshall candidates for Texas

Kurt Warner is helping to turn around Colin Kaepernick

Bizarre call allows Duke to cover spread against Utah

Saints love TE Josh Hill following Jimmy Graham trade

Darren Sproles' role is growing for the Eagles

Barkley, Reggie Miller condemn Indiana religious freedom bill

Sam Dekker's hot shooting sends Wisconsin to Final Four

Mariners prospect dies from boat accident injuries

T.J. Ford sends heartfelt tweet about Rick Barnes

Bruins coach suits up as backup goalie

Tennessee fans show love for Bruce Pearl on campus rock

Louisville guard turned off phone after clutch NC State win

Steve Spurrier hung out with Kenny Chesney before concert

Winnipeg Jets prospect had a rough night on the ice

Did Dean Smith’s gifts to former players violate NCAA rules?

WWE unveils Ultimate Warrior statue

WATCH: Tom Brady jumps off tall cliff in Costa Rica

WATCH: 2 Chainz beats Dominique Wilkins in HORSE

MLB News
Delivered to your inbox
You'll also receive Yardbarker's daily Top 10, featuring the best sports stories from around the web. Customize your newsletter to get articles on your favorite sports and teams. And the best part? It's free!

By clicking "Sign Me Up", you have read and agreed to the Fox Sports Digital Privacy Policy and Terms of Use. You can opt out at any time. For more information, please see our Privacy Policy.
the YARDBARKER app
Get it now!
Ios_download En_app_rgb_wo_45

John Fox: Cutler will have to earn job

Texas eyeing Shaka Smart, Gregg Marshall?

WATCH: Brady jumps off cliff in Costa Rica

WATCH: 2 Chainz beats Nique in HORSE

Elite Eight preview and predictions: Notre Dame vs. Kentucky

Elite Eight preview and predictions: Arizona vs. Wisconsin

NFL owners chose cost over game integrity

Be careful how loudly you cheer on Mo'ne Davis' olive branch

James Harden and the collective consciousness

Hottest coaching seats in NBA

Under-the-radar NL MVP candidates

Notre Dame’s coach makes first Elite Eight

Today's Best Stuff
For Bloggers

Join the Yardbarker Network for more promotion, traffic, and money.

Company Info
Help
What is Yardbarker?

Yardbarker is the largest network of sports blogs and pro athlete blogs on the web. This site is the hub of the Yardbarker Network, where our editors and algorithms curate the best sports content from our network and beyond.