I had an idea today, but I’m not sure if it would work, first of all, and even if it did work theoretically, it may be astoundingly impractical. But I’ll spell it out here, and you all can decide.
So a big question among sabermetricians – no, really just all baseball analysts/fans – is how to evaluate pitching. That is, what metrics/stats should we employ when trying to determine how well a pitcher has performed? Traditionally, Cy Young and sometimes MVP voters have used wins and ERA, but as countless smart people have pointed out, there are major flaws with these metrics.
Wins are very strongly influenced by offense, defense, the bullpen, and luck. Over a very large sample, win totals are indicative of how talented a pitcher is, but even then the variance is huge. ERA is better, but it is still influenced by defense and luck. If you have only Mike Trout’s in the outfield and only Ben Zobrist’s in the infield, A.J. Burnett becomes Roy Halladay (well not really, but you get the idea). Not to mention the fact that a large sample size is needed just to remove the random noise from ERA.
Sabermetricians have come up with various solutions. One of them is FIP, or Fielding Independent Pitching, which uses only strikeouts, walks, and home runs as the contributing factors to pitching performance. This solutions removes the defense and luck components out of balls in play, but it ignores any skill that may be, and often is, involved in turning balls in play into outs.
Others prefer to use methods based on runs allowed to measure pitching, thus including a pitcher’s ability to affect the outcome of balls in play. Of course, in doing so, they include luck and defense, which aren’t under the pitcher’s control.
My solution: wisdom of the crowds.
Good idea, right? Oh, you want more? Fine, I’ll explain.
What if there was a website in which baseball fans could, as they were watching a game, answer questions about every batted ball? The answers to these questions would determine the likelihood of that ball turning into an out, or a single or double or triple, given an average defense. With enough responses and data, we could determine what a pitcher’s true BABIP is, based on the types of balls that batters put into play against him.
The hard part is determining which questions to include, and how well fans would be able to accurately describe the plays they saw without being biased by the result. Here are some of the types of questions I’ve been thinking about:
- How hard was the ball hit, on a scale from 1-10?
- What type of batted ball was it (Choose one)? Options: Weak grounder, Grounder, Weak Liner, Liner, Fliner, Flyball, Popup
- Where was the ball hit (Choose one)? Options: Infield (3rd base line, left side, middle, right side, 1st base line), Shallow outfield (left, center, right), Deep Outfield (left, center, right)
- How much credit, or blame, do you think the pitcher deserves for the outcome of this batted ball, taking into account both defense and luck, from 0%-100%?
I’m certain that there are a multitude of issues with these questions. How well would fans really be able to determine how hard a ball was hit? The options in the second question overlap with the answer the first question – is this a problem, or would it lead to more accuracy? Does the location of the ball really matter, or only how hard it was hit? Obviously a shallow fly ball is different than a deep flyball, but can a pitcher really control whether a groundball is hit to short or down the middle? I don’t know. These are all important issues that I don’t know the answers to.
The fourth question would probably not be important once enough data had been collected, as the answers from the first three would be compared to the actual results of the batted balls in question. However, it may still matter, since two batted balls with identical answers to the first three questions could potentially be very different with regards to the fourth.
Maybe some examples would help my thinking and clarify what I’m suggesting. Below is the first video highlight that came up on mlb.com (I have a feeling the embed won’t work. If it doesn’t, just click on the picture and it’ll bring you to the vid).
- How hard was the ball hit, on a scale from 1-10? 3. This is a pretty slowly hit ground ball, but it could be slower. It’s hard to place a number on it, but 3 sounds reasonable.
- What type of batted ball was it? Weak grounder. It might be a regular grounder, but the results would probably be mixed, and would show that it’s somewhere in between.
- Where was the ball hit? Infield – left side.
- How much credit, or blame, do you think the pitcher deserves for the actual outcome of this batted ball, taking into account both defense and luck, from 0%-100%? 10%. Ok, I’m realizing now that this is a very confusing question. Maybe it would be better if there were only two choices, credit or no credit. I’m not sure.
Let’s try another:
- How hard was the ball hit, on a scale from 1-10? 7. This is a fairly hard hit ball, but it’s not a screaming line drive and it’s not too deep. Maybe 6 or 8 would be better, but I’ll go with 7.
- What type of batted ball was it ? Fliner. This is exactly what I think of what I think of a fliner.
- Where was the ball hit ? Shallow outfield – right. I’m thinking maybe I should add left-center and right-center, but that implies that a pitcher can control the location of a batted ball that precisely, which I’m not confident he can.
- How much credit, or blame, do you think the pitcher deserves for the actual outcome of this batted ball, taking into account both defense and luck, from 0%-100%? 20%. The pitcher probably doesn’t deserve to get rewarded for an out here, but at least it wasn’t a deep, hard-it fly ball.
So there you have it. Theoretically, if there were hundreds of responses for each play, we could estimate, with some degree of accuracy, how much credit a pitcher deserves for a given batted ball, and by combining all of the batted balls, determine the pitcher’s skill at turning batted balls into outs.
Now it’s your turn. Is my idea completely idiotic? Would it work theoretically? Would it work practically? How could I change the questions to improve accuracy and response frequency? Once I perfected it, how could I implement it? And what would I do with the results? I would love to hear any and all comments.