Sunday, August 9, 2015

Are There Any Statistics on the Misuse or Misinterpretation of Statistics?

The other day, Duque bemoaned the fact that Drew was hitting away into the shift with a man on first, when a bunt may obviously have been called for (Drew being Drew).

Walter, an astute commenter, replied thusly:

duque--read the relevant literature on this. It's well established by now that even when a sacrifice bunt succeeds, the likelihood of scoring a run is REDUCED. Yes--the chances of scoring a run become SMALLER WHEN THE SACRIFICE SUCCEEDS AND YOU HAVE ADVANCED THE RUNNER because the additional out decreases scoring chances more than the advancing of the runners increases it. This may seem counterintuitive, but the result is based on a computer analysis of gazillions of baseball games since the beginning of recorded time. The first such run expectancy chart appeared in The Hidden Game of Baseball in 1984 and has been reconfirmed by every statistical analysis since then. It might be useful for the presiding minds of this blog to acquaint themselves with the rudiments of advanced analysis to avoid sounding just like the daily sportswriting dolts they so acridly condescend to.
August 9, 2015 at 1:15 AM

And this brings up an amazingly overlooked point about the increased use of statistics in the analysis of baseball, and how they can easily be misused, misconstrued and otherwise mishandled if not in the hands of somebody mathematical who really knows what they're doing. (Not a comment on Walter, but on his sources.)

There is a reason that there is an old saying that there are three kinds of lies--lies, damned lies, and statistics. Statistics are easily misinterpreted and manipulated (I've been in advertising and marketing for almost 40 years, trust me on that last one). Statistics also are not always right because they are not always measuring the specifics of a particular situation. 

Consider the most basic stat: the batting average. If Drew is hitting .190 and is facing a shift because he consistently pulls the ball even with outside pitches, statistically he likely has a less than 19 out of 100 chance of getting a hit. This means that he could not only make an out that leaves the runner at first, but that he could hit into a double play and erase the runner. Whereas, if he can successfully bunt in the opposite direction of the shift, he not only moves the runner into scoring position but has a better than 19/100 chance of reaching base himself. Statistics show that he should bunt in that situation.

With runners on first and second, or even with a runner on second with the out, the batter becomes Ellsbury, who has a 28/100 chance of getting a hit overall (although his more recent performance statistics have been worse), which is better than Drew's chances if he does not bunt.

To say that, statistically, the numbers prove that bunting the runner over actually decreases the chance of scoring is not necessarily competent analysis of the statistics, because all situations involving all batters who have sacrifice bunted since the dawn of time contain a variety of irrelevant situations -- including a number of cases where competent hitters bunted and were followed by less competent hitters, a situation that would happen in the old days when bunting the runner over was a chore that fell to some players we would never dream of using for that purpose today. There are other situational considerations that a mathematician baseball fan might be able to point out.

This is the problem with the blanket use of statistics in baseball as "proof" of many things. The best use of statistics is when they are winnowed down to very specific situational subsets that take into account as many details and variables as possible, which, correct me if I'm mistaken, is not the case in the stats Walter is pointing to.

If there is a subset of the bunting stats that's a breakout of sub-.200 pull hitters facing a shift who bunt the other way, and the resulting chances of scoring a run if that bunt is successful when the following hitter(s) is averaging a statistically meaningful number of hits more per at-bats (preferably against the type of pitcher being faced or even against the particular pitcher if there are enough occurrences to be statistically meaningful)...then I think we've really got something.


joe de pastry said...

I agree that a successful sac bunt would not help. But I think the real point is that if Drew had a good chance to bunt for a hit he should have tried, partly because even a foul bunt would have forced the defense to put a fielder on the left side, increasing the chances of Drew getting a hit.

Local Bargain Jerk said...

I thought this was a well done post. It has inspired me to do a statistical analysis of today's game.

I crunched the numbers using data downloaded from and, if you score zero runs, as the Yankees did today, you will tend to lose nearly 100% of the time.

Numbers don't lie.

Leinstery said...

Here's a statistic: 26 scoreless innings, in their most important series of the season no less. Fuck these cocksuckers.

Walter said...

John M. confuses several different issues. The run expectancy tables concerning sacrifice bunts are not the same issue as a pull hitter bunting for a hit against the shift.

Moreover, all the variables of sacrifice bunting--the prowess of the hitter, the likelihood of scoring one run, the likelihood of scoring more than one run, etc., etc., have been parsed to a fare-thee-well. I suggest that John M. begin his education on this topic with the relevant chapter in the Baseball Prosepctus Book Baseball Between the Numbers, available on amazon. There are people ten times as smart as he or I who have been beating their heads against this for decades. The bottom line is that ON AVERAGE--unless John M. himself or Jerry Lewis is at the plate--a successful sacrifice bunt marginally reduces the likelihood of scoring a run. That's not somebody's opinion--it's a fact culled from the entire history of baseball, and holds true for all periods of the game--but even truer since 1920.

jdrny said...

The shift applied to a .190 hitter is something new to baseball.
John M is correct. Drew has to bunt. Zero runs is the proof.

Dutchfan said...

This post is one of those that is not only fun to read, it is intelligent as well. It states 2 clear perspectives and leaves us readers with a choice. Wonderful. I guess that is what statistics are for. Because, as pointed out, they can be manipulated at will, so in the end it is more a matter of believe than 100% certainty.
Having said that I lean toward the sacrifice bunt in this particular case. The case Drew. The case of a sub. 200 hitter.

Are there any stats on having a runner at first with above average speed combined with a sacrifice bunt by someone of the Drew persuasion?

Walter said...

I found John M.'s post completely obtuse because it conflated several disparate issues, as I've already made clear. If you guys want to know more about this, there's plenty of stuff on it on-line--just Google the topic. Or read the relevant chapter in Baseball Between the Numbers. In the meantime, as Dylan once said, don't criticize what you can't understand.

el duque said...

Hey, I thought both posts were cool. You raise an interesting point, and the truth is, I've never even heard of that statistic - and I'd thought I'd heard everything. My beef with Drew is his inability to bunt for a hit. You'd think he was Babe Ruth, the way he clings to his power stroke. He hits liners to right, and they are fielded, and unless he adjusts, he is killing us at bat.

I wanted Drew to scratch out an infield hit. If all he did was sacrifice, well, it wouldn't have been TOO bad.

John M said...

The thing is, ON AVERAGE is a pretty worthless statistical analysis. ON AVERAGE the Yankees have the second most fearsom offense in the league, but we know the truth of that. They can score 50 or 60 runs one week and 3 the next. But ON AVERAGE, they're killing the ball.

This is the problem with statistics used in such a general way. My post covered a couple of different aspects to the same problem: the problem is, statistic-happy fans have somewhat bought into some sham science. It's not that stats can't be analyzed in ways that actually mean something, it's that stats are analyzed all kinds of ways that don't necessarily mean anything, but are presented as something we should pay attention to.

The ON AVERAGE results of sacrifice bunting just served itself up on a pintstiped platter as a prime example of the latter. Just remember, if Donald Trump lived on an island with 20 servants that he paid $20,000 apiece per year, ON AVERAGE everyone on the island is a multimillionaire.

KD said...

John, you nailed this argument. the choice was not to sacrifice or hit away. the bunt would have been an attempt for a hit. A hit that they were practically giving to us! Fucking Drew is terrible. Bring back Refs.

liveamovielife said...

Cannot wait for Walter's blog, which will undoubtedly be as humorous and informative as IIHIIFIIC but will be 100% accurate and will require the latest edition of Webster's to digest the content!

Walter, I strongly suggest you avoid Duque's book. It's the perfect dialogue of a baseball fan... quirky, impassioned and nonsensical. Just like his posts and the game itself. I'm sure you'd hate it.

liveamovielife said...
This comment has been removed by the author.
liveamovielife said...

You're a good, respectful man Duque.

June Cleaver said...

And you play well with others.

Walter said...

Just for the record--John M.'s last comment is completely incoherent and utterly misses the point. He was advocating A BUNT FOR A HIT AGAINST A RADICAL INFIELD SHIFT. I was talking about the proven value of sacrifice bunts. Then he meanders onto a point about the worthlessness of averages, having just cited Drew's .190 AVERAGE in support of his point. So Joe Girardi and all other managers are henceforth instructed by the Mad Oracle John M. to IGNORE, say, batting averages in making up their batting order since averages are now revealed to be meaningless and useless. Bat Drew first--no make that third--in the order, and Teixeira eighth and Ellsbury last. Better yet, put Tanaka first in the batting order and have him catch, since fielding averages are likewise deemed to be sheer nonsense, like all other averages. Do I have this about right, John M.? (Oh--and sacrifice bunt at every opportunity despite the fact that the averages show it to be a counterproductive strategy in most cases.) (Minor question--why does duque allow this guy to post features on his blog--Oh, i forgot--HUMOR--in this case, the humor being unintentional and the joke being on John M. GOTCHA!)

el duque said...

Hey, I get it that, statistically, there is a cool point here. In fact, I had no idea that the numbers skew so negatively against sacrifice bunts. In that sense, Walter, you're right! But so is John M - (my friend, so don't dis him!) All the numbers in the world don't mean crap in a one at-bat situation. I say Drew should have bunted, shortened his swing, gone to left, tried SOMETHING. He not only sucks at bat, but he's not evolving, not adjusting - in my mind, he's failing the Yankees out of sheer ego: How in hell does a guy hitting .190 - who is being killed by the over-shifts - continue to just hack away, hack away, hack away - pull the ball, trying to hit home runs... and not try something different. Obviously, the Yankees like Drew. He must be a nice guy. But when he comes to bat, it's time to take a piss. And it's now been a year of mediocrity. This is not a bad week. This is a bad player. We dumped Brian Roberts last year, and he was hitting 50 points higher than Drew. How did we end up tethered to this guy? Here's a stat to check: IF ALL HE EVER DID WAS BUNT, COULD HE DO WORSE?

John M said...

Yeah, Duque, what you said. Walter seems a might tetchy about this stuff, but I was goading him to the extent that my point was, statistics can be used improperly and that whole bunting argument was a classic case in point.

I thought the picture of Irwin Corey was a hint that my 'analysis' had a little bit of tongue in its cheek, but obviously not enough of a hint for everyone.

Too much statistical analysis, not enough baseball. Still think the number crunchers can go too far a lot of times because it's become a religion instead of an illumination.

And I can go too far also, it would seem.

Sorry for causing a kerfluffle and bearbaiting. Walter, don't be a true believer, it's a dead end. And see Professor Tanaka for some tips in logical jiujitsu, he's done wonders for me and the missus.

Rufus T. Firefly said...

John M:

I for one appreciate subtle humor. Maybe because it always seems beyond my reach.

Walter said...

PART I: duque: I already acknowledged your point several times--sure it's possible that Drew should have bunted FOR A HIT in that situation AGAINST A SHIFT, but that was not MY point, which had to do with the advisability of sacrifice bunts IN GENERAL. This is scarcely news to anyone with basic literacy in advanced statistical analysis--as I said, the insight was first documented in Palmer and Thorn's Hidden Game of Baseball in 1984 and has been confirmed, furrowed, and refined through numerous studies since then.

Of course, now that John M. has been shown to be a complete fool in his mash of self-contradictory, incoherent "arguments" about the untentability of averages, he wants to pretend that it was all a big joke. It's true, but the joke was on him. The depiction of Professor Irwin Corey was intended as a swipe at those pointy-headed SABRE guys--the blogging equivalent of good ole boy George Wallace know-nothingism. ANd his latest sage counsel about not being a "true believer"--this is REALLY laughable. He has shown himself repeatedly to be a true believer in the limited and stultifying array of traditional kindergarten stats that have been misleading dumbasss sportswriters, announcers, and average Joe fans like John M. for decades now--even the hapless John M. cites the old useless warhorse batter AVERAGE in his witless broadside against the validity of AVERAGES, thereby comically annulling his own gradiose "insight" in the succinct space of one sentence.

It is a trivial truism to state that any statistical analysis of probability cannot be a CERTAIN guide to the outcome of any single at-bat or play in baseball--precisely because these are statements merely of PROBABILITY, not clairvoyance. So we might decide that because there is a 34 percent chance that Ted Williams will get a hit and drive in the winning runs with runners on second and third in the ninth inning, we will pitch around him and prefer to take our chances with the guy batting behind him who has only a 23 percent likelihood of getting that hit. It's fatuous to the point of nullity to state that Williams MIGHT strike out in that situation and the next guy up drive one into the gap--but it would be just plain stupid to ignore those probabilities (otherwise you end up parroting Sterling's empty observation that you can't predict baseball--of course, not in every instance, but you CAN ANTICIPATE LIKELY OUTCOMES and should if you're not an imbecile like Sterling). So it's very possible that Drew, with his abysmal batting average, should have bunted for a hit against the shift to keep the Yankees' hopes alive. And, of course, if he had squared up and stroked one into the gap in that at-bat, we wouldn't have heard a peep from John M. or duque, because this is a classic second-guess based on the unfavorable result of swinging away in that one instance. BUT--keep in mind that John M. and duque are basing their argument on the very statistical probabilities that John M. claims to disdain-namely his .190 batting average. The same reliance on probabilities governs the contemporary attitude toward bunting--many of the most successful contemporary managers are the ones who minimize their use of traditional low-probability strategies like sacrifice bunting, hit-and-run, and stealing, all of which have been shown (with suitable qualifiers and refinements) to lower rather than increase the probability of scoring.

This is why the owners and management of baseball teams who have hundreds of millions of dollars at stake in their franchises no longer pull John M.-style folk wisdom out of their asses but rely on the most advanced statistical analysis they can develop, much of it proprietary and not even available to the public.

Part II to come . . .

Walter said...


Finally, I would like to suggest to John M. that he favor us with a definitive refutation of what he evidently regards as the dogmatic, nonscientific "true believers'" follies of advanced statistical analysis: if you, John M. hurl yourself from the street overpass at 97th and Park Avenue onto the Metro North tracks into the path of an onrushing commuter train, there is a 99.999 percent chance that you will be dispatched to another world where your inane ramblings might finally be taken as the outpourings of misunderstood genius. Please do us all a favor--prove me wrong by jumping and then walking away intact from your leap in front of the train. If the statistical average happens to bear me out in this case, you will at least be taken to a better world where your eccentric genius will be lauded and appreciated rather than ridiculed as arrant stupidity masquerading after the fact as arch wit. If you do walk away intact from your encounter with the commuter train, I promise to never venture another harsh word about your incessant goofball dilations on this blog. Deal?

Dutchfan said...

Natuurlijk heb je gelijk. Denk ik. Al is er een verschil tussen gelijk hebben en gelijk krijgen. Het is een beetje als Nederlands spreken tegen een Amerikaan. Statistiek is een andere taal dan honkbal verstand.

And that is all I have to say about that.