Originally written on Fangraphs  |  Last updated 2/20/13
In my Monday post about the White Sox recent success beating preseason projections, I included a statement that I’ve mentioned a few times over the last few years: But also, please just keep in mind that projections are not predictions. They are a snapshot of what we think a team’s median true talent level might be, and it should be understood that there’s a pretty sizable margin for error based on things that projection systems simply can’t forecast, and also the errors that come from having imperfect information or imperfect calculations. I wrote about this distinction a couple of years ago, but I think it’s worth delving into the differences again. For one, FanGraphs has gotten a lot larger over the last few years, so many of you might not have read that piece, but also, I think there’s a few things that I could have stated better in that article, and I want to give more context for why I see the distinction as meaningful rather than being a semantical argument with no practical use. Let’s start out by acknowledging that predictions are a subset of projections. Or, to put it another way, predictions are projections, but a projection isn’t necessarily a prediction. I know that’s a bit of a tongue twister, and seems like a semantical difference, but think of it like this: Mothers are women, but not all women are mothers. No one would suggest that it is simply semantics to clarify whether a women is or is not a mother. There’s a meaningful difference there. So it is with predictions and projections. A prediction is essentially a projection where there is a high degree of confidence in a specific outcome. Not all projections lead to that kind of confidence in one result, however. In fact, in many cases, an accurate projection will result in a range of outcomes where there is no single result that is likely to occur. Let’s take the NBA’s Draft Lottery, for instance. The 14 non-playoff teams get various combinations of numbers assigned to them, and those numbers correspond to 14 ping pong balls that are placed into a lottery machine. The team with the worst record gets 250 of the 1,000 possible combinations, and then the second worst team gets 199, and each successive team gets fewer than the one in front of them, down to the 14th worst team getting just five of the 1,000 possible combinations. Because the NBA doesn’t want teams to drop too far by random chance, they only draw for the first three selections, and then the remaining teams are slotted in from #4 to #14 based on win-loss record in the previous year. The Wikipedia entry on the draft lottery has a pretty nifty chart showing the various odds of each outcome, which we’ll reproduce here: Seed Chances 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 1 250 .250 .215 .178 .357                     2 199 .199 .188 .171 .319 .123                   3 156 .156 .157 .156 .226 .265 .040                 4 119 .119 .126 .133 .099 .351 .160 .012               5 88 .088 .097 .107   .261 .360 .084 .004             6 63 .063 .071 .081     .439 .305 .040 .001           7 43 .043 .049 .058       .599 .232 .018 .000         8 28 .028 .033 .039         .724 .168 .008 .000       9 17 .017 .020 .024           .813 .122 .004 .000     10 11 .011 .013 .016             .870 .089 .002 .000   11 8 .008 .009 .012               .907 .063 .001 .000 12 7 .007 .008 .010                 .935 .039 .000 13 6 .006 .007 .009                   .960 .018 14 5 .005 .006 .007                     .982 If you were the only person on the planet who knew those odds, what kind of predictions would you be willing to make? Would you predict that the team with the #1 seed would win the first overall pick? I’d hope not, because you’d be wrong three times out of four, even though you were selecting the most likely outcome every single time. On the other hand, you probably would be willing to predict that the #14 seed would pick 14th, because a 98.2% chance of being right is pretty darn good. Depending on how much you value your own credibility, you might even be willing to predict the outcome of picks #8 through #13, since the likelihood of being right on each was greater than 72%. You wouldn’t always be right, but you’d be right often enough that your overall record would come out looking pretty good. But, hopefully, you’d be wise enough to steer away from predicting any kind of specific result for anything in the top 7, where your odds of being right would be between 25% and 60%, meaning you’d be taking the side of something close to (or worse than) a coin flip in each case. If someone asks you to predict who is going to win the #1 overall pick in the NBA Draft Lottery, a correct interpretation of the data is simply “I don’t know.” Preseason win-loss projections for Major League teams are much like the NBA draft lottery, just with the caveats that we’re not dealing with perfectly known variables and there’s no artificial floor placed below each team to keep them from crashing due to random variation. With all of the unknowns that are simply outside of the realm of forecasting, every possible win-loss record you could dream up for any team is unlikely. It doesn’t matter how good or how bad the team is; the spread of talent across the league is simply not large enough to allow us to have confidence in any given win-loss record to make a prediction, given all of the variables that we know we can’t forecast with any kind of certainty. It doesn’t mean that these forecasts are useless, of course. Despite having a range of unlikely outcomes, we can still come up with a projection that is likely enough to occur for us to make a prediction, but that projection has to be a range of numbers, not a single outcome. Since even the best projection systems tend to have standard deviations from actual win-loss results of 6-10 wins, we can say with something like 95% confidence that a team will finish within +/- 16 games of their mean projection. So, you could confidently predict that a team that has a projected 81-81 record would win between 65 and 97 games. The problem, of course, is that’s not very helpful. Anyone could predict that any team will be somewhere between “terrible” and “excellent”, and you certainly don’t need any fancy algorithms to say that a team could finish somewhere between first and last. This is why making preseason predictions is kind of silly. We simply don’t know enough in advance to be confident enough in our forecasts to make declarative statements about small ranges of outcomes. We don’t have to get to the 95% confidence level that two standard deviations brings about, of course. Knowing that 68% of teams fall between +/- 8 wins of their projected record is still useful, as long as the results aren’t overstated. Knowing that, we can look at a team with a projected 75-87 record as an unlikely contender, but more importantly, we can look at a group of six teams projected for mid-70s records and realize that one of them will probably make a playoff run, since we’d expect two of the six teams to fall outside of the standard deviation range, with one on the high side and one on the low side. In other words, if we look at all the teams that are projected to win between 75-80 games, we might find a list that includes the Orioles, White Sox, Brewers, Pirates, Padres, and Royals. None of them are likely to make the playoffs, but as an aggregate group, this is a pretty good place to start if you’re looking for a “surprise team” in 2013. It doesn’t mean that the surprise team will certainly come from that group — the Orioles weren’t forecast as a mid-70s win team last year, for instance — but starting with the preseason forecasts and knowing the standard deviation can help guide decisions about what teams should be making more aggressive efforts to improve their teams in the short term versus focusing on the bigger picture. Where one can start to get into trouble is if they start treating all projections as if they’re predictions. Every preseason win-loss forecast that comes out over the next six weeks is going to put a single number on each team as the most likely outcome, but it’s important to remember that every single of those numbers is likely to be wrong, and that the spread in expected wins around that number is pretty large. When a team like the Indians starts upgrading their roster, the hope is not that they can push their forecast mean total up to 81 wins from 75 wins — which can be viewed as a meaningless difference if one is solely focused on a binary playoffs/no playoffs outcome — but that they can raise the amount of opportunities they have to have things break right and end up with 90+, sneaking their way into October baseball in the process. The conflation of projections and predictions lies partly with the public’s fascination with “making a pick” and then defending it — those kinds of stories are extremely popular and drive a lot of traffic — but are also born out of the way forecasters have chosen to display their results. If we want to really get across the meaningful difference between projections and predictions, maybe we’d be better off displaying the results of preseason projections as overlaying bell curves rather than a simple standings table with the weighted mean representing the entire projection. Or maybe something like the way the guys at RLYW do it, with pie charts showing the differences in how often a division is won by each team in its simulations. So, forecasters, here’s my request: Show us more than the single weighted mean outcome when doing win-loss records. Give us the confidence level of each number between 60 and 100 wins. That’s interesting data, and it’s helpful in pointing out that the projections you’re making are not predictions that you’re attempting to stake your reputation on. And, writers quoting those projections, let’s do a better job of calling them what they are. Or, more specifically, what they aren’t. The forecasters are doing a real service by publishing their results. Let’s not pretend that all that work is simply a prediction, no different than a random number pulled out of thin air by a television talking head. There is a difference, and we should try to shine a spotlight on those differences whenever possible.
Ios_download En_app_rgb_wo_45

Jim Mora turns P. Diddy fight into recruiting pitch for UCLA

Illinois releases results of women's basketball probe

Richie Incognito named starter by Rex Ryan

DeMarco Murray isn't thrilled Eagles limited reps in practice

Brandon Weeden: 'God forbid I am forced to play' for Dallas


Ben Roethlisberger: Markus Wheaton is ready to break out

Hernandez tipster allegedly had sexual relationship with him

Rookie WR Tyler Lockett has Pete Carroll, Seahawks gushing

Packers TE returns to practice in wake of family tragedy

DeSean Jackson: Revis, Sherman can't stop me

Rob Gronkowski on Patriots LB Jamie Collins: ‘He’s a freak’

Lindsey Vonn shows support for Tiger Woods on Twitter

Report: Rousey to star in film based on her autobiography

Porn star offers to cook meal for any sports champion

Why everyone is wrong about Cowboys Tony Romo, Part 1

Oregon was only team drug tested during CFB Playoff?

Five 2nd year NBA players ready to take the next step

John Daly wants to be US Ryder Cup captain

Rodgers appears in another Olivia Munn workout video

Matt Hasselbeck is still getting random drug tests

Five teams that won at the MLB trade deadline

What should be expected from Aaron Rodgers in 2015?

Kareem Abdul-Jabbar is waiting for Tim Duncan to match him

MLB News
Delivered to your inbox
You'll also receive Yardbarker's daily Top 10, featuring the best sports stories from around the web. Customize your newsletter to get articles on your favorite sports and teams. And the best part? It's free!

By clicking "Sign Me Up", you have read and agreed to the Fox Sports Digital Privacy Policy and Terms of Use. You can opt out at any time. For more information, please see our Privacy Policy.
Get it now!
Ios_download En_app_rgb_wo_45

Lindsey Vonn shows support for Tiger

Notre Dame doesn't need a conference

Rex takes jabs at Jets' management

Sanchez: It's 'crazy' to call Chip racist

Ronda Rousey calls out Cyborg

Five most underrated players in the NFL

Five potential NFL salary cap casualties this preseason

Winners and losers of the 2015 MLB trade deadline

Pirates do Pirates at deadline: Neat little moves

Mets acquire Cespedes from Tigers

Ranking the NFL’s 32 head coaches

Five worst baseball trades since 2000

Today's Best Stuff
For Publishers
Company Info
Follow Yardbarker