Good referees want to get the call and the game “right”. We’re also human so we have to deal with decision overload, incomplete information and players actively trying to game our judgement. So what do good referees do?
They go Bayesian.
Or at least that is what an interesting baseball sabrmetrics article argues:
Researchers confirmed that the effective size of the strike zone at 0-2 is only about two-thirds as large as in a 3-0 count.
A careful review of all the relevant data reveals a valid—and much simpler—explanation for umpires’ shifting zone: They are just trying to make as many correct calls as possible…
at a 1-2 count umpires call 96 percent of OZ pitches right, but correctly call only 66 percent of IZ pitches… Umpires can pick their poison—reducing the risk of a mistake favoring the hitter or the chance of making a pro-pitcher call—but only at the price of a rise in the other error rate….
Umpires are changing their decision rule on close calls—“lean ball” vs. “lean strike”—based on the likelihood that the pitch will actually be a strike. … By adjusting their decision rule at each count to reflect their prior knowledge of the true distribution of pitches, they make better guesses and fewer mistakes. In short, umpires are Bayesian, not compassionate…..
Overall, umpires’ shifting decision rules appear to lift their accuracy rate by almost 2 percentage points (87.5 percent vs. 85.8 percent), preventing thousands of additional wrong calls each season. Moreover, the accuracy gain is particularly large in several key high-leverage counts.
This makes a lot of operational sense for me as a soccer referee. The first building block of a young soccer referee is to recognize the obvious fouls. A good referee will see the cleat to the knee, she will see the shirt pull, he will see the charge through the back of the scapula as the attacker is trying to turn the corner after a long flank run. That is just the first step.
And those are the easy calls to make. We’re in the open, we have a good angle, and there are not too many players in proximity to play.
Calls get more difficult in the mixers off of set piece plays (corner kicks, free kicks heading into the box) where there are fifteen to eighteen players within 150 square meters. People are jumping, leaping, holding, screening, whacking, kicking and creating space for themselves. What the hell is happening?
Reading the mixer is one of the first intermediate refereeing skills that needs to develop on advancement. Referees need to pick up the obvious fouls such as the drag downs but we’re also trying to play the odds by checking to see if there is an attacker posted in front of the keeper and if the keeper has his forearms prepped for a backside shove. We need to read which attackers are being used to set the pick to free the leaper and which defenders are going to try to break the pick line by running through the gaps versus running through the players. We need to read how the roll defenders will follow their man and if they are going to avoid getting beat by grabbing a shirt.
As soccer referees advance, we are taught to “read” play. That is simply using our experience, our study, our scouting to anticipate what the players are going to do next. If the ball should be played in a 15 yard triangle on a break-out through the central midfield, I’ll want to get wider. If the team is overwhelmingly right footed, I might want to adjust my position. If the ball really should be sent long, I should be able to read the play and steal two or three steps on my sprint before the ball is actually sent long. Being able to read play makes reffing advanced games straightforward as the players are predictable because they do what they should be doing.
The best referees are the ones who can read play really well and then act upon their internal predictive model of the immediate future to make fast and correct decisions. The really good refs are seeing the game and predicting the game to produce a better game.
Good officials have to be Bayesians with frequently recaliberated and updated priors that feed into an evolving model of the game.
Baud
Replace them all with computers. We have the technology.
Weaselone
@Baud:
Yup. And if it’s an issue of tradition, we can still have the umpire call the balls and strikes so long as the determination is made by the computer.
Baud
@Weaselone:
Maybe have one of those Disneyworld animatrons behind the plate.
Tom Levenson
Lovely stuff, Richard. There’s an analogy here in teaching and grading, I think.
OzarkHillbilly
@Baud: Nobody could ever hack one of those.
Richard Mayhew
@Tom Levenson: Why would not easy to quantify but very easy to verify by observation metrics be needed for evaluation… we can throw everything into complicated, error prone formula that is only fueled by student test score changes to derive a useful and accurate evaluation for teachers. Nuance is not needed. Nuance is only needed for high value activities like sports.
/snark off
FEMA Camp Counselors
First thing that came to mind when I saw the word “Bayesian” was that weird AI guy who wrote a Harry Potter fanfiction to explain his Bayesian philosophy.
Richard’s was the better explanation.
Tom Levenson
@Richard Mayhew: Yeah, well…
I was actually thinking about the problem of teaching and assessing writing at the grad level. (Enough about me. What do you think of my hair?)
Long and boring story, but in such settings there is a stated standard. (E.g. “A” work = ready for publication w. no more than copy editing.) And there’s the fact of very different degrees of difficulty in assignments, in approaches taken, timing in the semester or year and so on. So this post touched a nerve: I find the notion of an unconscious Bayesian-ism affecting the way we grade to be very plauisble.
And of course, high stakes testing in K-12 is bonkers on so many levels.
Central Planning
I think this kind of thing happens with just about anything in real life, and some people do it better than others.
For example, one of my kids has been on the robotics team for 4 years and he has done mechanical work on it every year. He is also the go-to kid in the house for any construction/repair for his siblings and he helps me with the bigger projects. Anyway, with the robotics stuff – he can see when kids are going off the rails, when designs will be too complex, not reliable, or a problem much sooner than the other kids. He realizes it too which makes it pretty funny when he recounts to me what’s been going on.
It’s the same with stuff I do at work. I can see the end goal for the networks my customers are building, and knowing my customers, where they will do what they think is right when it ends up making more work for them in the future.
Maybe Baysesian is just a synonym for life experience.
Richard Mayhew
@Central Planning: “Maybe Baysesian is just a synonym for life experience.”
I think it is highly correlated with life experience tempered with expertise and introspection.
There are referees who have been reffing for 20 years but they are effectively on their 20th 1st year. There are other refs who have had the whistle for 5 years and they are becoming Bayesian in their game management.
Your kid is incorporating his experience, and his knowledge to create a predictive model for when things will go pear shaped. There are other kids on his team who have been doing this for years and will still be surprised when things just don’t work.
I think the other way of looking at the umpire article and refereeing is that there is a decision about what type of mistake is acceptable that is extremely scenario dependent. I’m willing to miss a handling at midfield where the ball is batted forward by an attacker while I am not willing to miss that same action 3 yards from goal. It has to be a trade-off that I am getting better aggregate foul recognition and game management in trading off looking for a handling in tight versus not looking as hard for it at midfield, but it is a decision to accept that trade-off to get the big things right more often.
Peter VE
In reading the game, I loved watching Dino Zoff, the great Italian keeper, during the ’82 World Cup. He never had to make the great leap to stop a goal, he just stepped to where the ball would be in another second or two, and caught it easily.
Punchy
@Baud: Why not have the entire team full of robots? No more concussions, CTE, players’ unions, or flopping. And robot cheerleaders with metal boobs and chainmail skirts. Mechanical engineers become the highest paid employees on the team.
mere mortal
Great post.
The bit about players gaming the referees’ judgement and expectation rings true, and leads to many calls where I’m yelling at an inanimate object because a player won his short con. Lord, I’m glad I’m not a die-hard soccer fan.
And the Bayesian decisions are the most irritating as a basketball fan on breakaways, where a trailing referee will call a foul based on defender proximity to any missed fast break.
I wonder whether the excellent baseball analysis could be replicated on basketball contact that might disqualify a player or interrupt a comeback scoring run.
paper
Have you watched Marc Geiger in MLS this year? He has that spooked look, like “what am I going to misread today?”
dollared
Good points, Richard. A couple of points from my experience.
1. unfortunately, I think the Bayesian approach is more likely to reward established players. For example, great control pitchers are expected to paint the corners, even on strike 2, so they get more benefit of the doubt. That is a big challenge in pro sports.
2. The second is that there is nothing more hilarious than an entirely unexpected piece of play that confounds everybody on the field, including the ref. Sometimes there are fouls of such unbelievable clumsiness or plays of such unexpected deftness that there is this hushed silence while the ref tries to replay it in her head and then decide to do something. Or not. And often the result is fair, but there is that moment of complete uncertainty that just makes me want to laugh.
Richard Mayhew
@paper: I have not seen him this year, I’ve always liked how he calls the game. I’ll try to watch his next game or at least DVR it when he has the middle on ESPN.
@dollared: Most definately, established players get stronger priors and thus a bit more wiggle room… the problem with Bayesian refereeing is that it may increase the % of good calls at the cost of some misses being complete WTF misses.
As for #2 — hell yeah, there are plays where it is just such an odd play that I can’t figure out what the hell I just saw
I was reffing a low level U-18 game yesterday afternoon and Player A (on a dry field) somehow managed to go ice skating for 10 yards into a teammate who went into an opponent — I had a trip on Player A but it took about 4 seconds for me to get brain to hand and then hand to whistle and then whistle to mouth … nothing made sense as to how the hell Player A got over there.
Thankfully no one was hurt.
That is one of the reasons why I still ref a few very low level rec games a year. I need to see WTF and process WTF
smintheus
The best refs in hockey are former players, who understand implicitly what they’re watching on ice. There are not remotely enough good refs in the NHL, which becomes more obvious the more the League relies on technology for basics like offsides; NHL refs frequently watch video reviews showing they made the wrong call, but refuse to reverse themselves.
Another huge problem is that the NHL has long played favorites among teams, some of which it promotes heavily, and allows refs to play favorites among players as well. I’d bet that it would be extremely off-putting to work in a league where you know that your career advancement depends upon giving Team M every possible chance to get through to the next round.