The recent Nicolas Mahut / John Isner match at Wimbledon was a statistical record-breaker in many, many ways, but sporting records fall all the time.  It was the way that it demolished previous benchmarks that was really remarkable. It got me wondering about how likely it was that the previous records could be so comprehensively trashed, and what a statistical analysis of the match might look like.

John Isner and Nicolas Mahut at the end of their epic fifth set

Despite the number of records broken, they all stemmed from the length of the fifth set. Number of aces served by a player; total aces served in the match; length of the whole match: they could only happen – and indeed almost had to happen – because of the nature of the final set. That final set went to 138 games (Isner taking it 70-68), whereas the previous longest fifth set in a Grand Slam tournament only went to a measly 40 games when Andy Roddick beat Younes El Aynaoui 21-19 at the 2003 Australian Open. The longest set ever played in a Grand Slam tournament saw John Newcombe take a fourth set 25-23 against Marty Riessen at the US Open of 1969, well before tiebreakers were used to settle the first four sets.

I was bothered by just how much of an outlier Mahut and Isner’s fifth set was, and I wondered how a statistical model might be devised to simulate the set. I wanted to the set, rather than the match, to be the subject of the statistical analysis because I felt that that’s where all the weirdness was. (Actually, during the course of investigating the match, I found out that the players had put together an extraordinary sequence of 167, spanning part of the third set, all of the fourth set, and most of the epic fifth.)

I started off by looking at how to figure out the probability that either player would win any given game. I went to atpworldtour.com to analyse the playing statistics of Mahut and Isner, and found the following data.

Proportion Won
Isner Service Games 88%
Return Games 14%
Mahut Service Games 79%
Return Games 23%

This data was year to-date, up to 21st June, and therefore fortunately doesn’t include the Wimbledon match between the pair. The data is not ideal, because while Isner’s stats are based on 999 games played, Mahut’s are based on a much smaller sample of 114.

So, how to model games between the two players? I decided to split the difference. If, overall, Isner wins 88% of his service games, and by inference Mahut loses (100-23) = 77%, the simplest option, in the absence of any obvious suggestion that I should do otherwise, would seem to take an average. So let’s assume that between these opponents, Isner would normally be expected to win

     (88+77)/2 = 82.5%

of his service games.

In the same way, Mahut normally wins 79% of his service games, and Isner normally loses (100-14) = 86% of his return games. So by the same reasoning, our model will assume that the probability of Mahut winning his service games is

     (79+86)/2 = 82.5%

Coincidentally, each player has the same chance of winning any given service game. A few thoughts on this:

  • The equality of the two players’ probability of winning their service game implies that they are evenly matched, giving some reassurance, perhaps, that it’s not unreasonable that the fifth set would drag on.
  • The service-win historical data is only to two decimal places. It’s likely that, if more accurate data was to hand, the model might shift however slightly in favour of one player than the other, and hence a tendency to shorter sets.
  • It seems like the equality of the two numbers might make the maths easier later on.

And a few comments on this usefulness of this model in general:

  • The service-win historical data could be more sophisticated: it would be interesting to weight the data to favour matches played on grass (as Wimbledon is), and/or in favour of big tournaments, and/or taking into account the handedness of the players.
  • I know nothing personally about Nicolas Mahut, but atpworldtour.com mentions his “inconsistent results”, which makes me think that a model that incorporates variance into service-win probabilities, instead of simply using means, might have merit.  But not here.
  • I’m assuming independence between games. This is an assumption I’m not particularly happy with in general, although it may not be too bad in the context in which I’m using the assumption. In tennis in general, I have a feeling that the assumption won’t hold in certain cases. For example, if a set goes to 5-0, and the winning player serves the next game, I have a suspicion that the next game is likelier to go to the server than it would normally be, as the losing player gives up on that set and conserves his energy for the next set. That situation doesn’t apply in the fifth set of Isner/Mahut, because it was the deciding set, but what does change from one game to the next is how tired the players are. My feeling, however, is that if tiredness had been a significant factor, the set would have been over earlier than it actually was, so perhaps in the context of this one particular set, independence can be assumed, in the absence of an argument to the contrary.

So, given some basic statistics to work with, how might we model the fifth set, and in particular get some sort of probability distribution for its overall length?

The way to do this is to break it down into two stages.

First, I’m going to define a “normal set” as one in which the winning player wins exactly six games, and a “long set” as being one where the winning player has to win at least seven games. What is the probability of a long set occurring?

Any long set has to, at some point, have a score of 5-5. Instinctively this sounds right, but let’s prove it. Suppose ten games have been played in a set. The only way in which this set has not already been won is if the score is 5-5. And if the score is 5-5, then no player can win the set unless they get to seven games. Therefore, any set that gets to 5-5 is a long set, and any long set must at some point have a score of 5-5.

So the probability of having a long set is the same as the probability of a score of 5-5 after ten games.

The only way this can happen is if Isner and Mahut both win the same number of service games. Again, it sounds instinctively right to say that, but let’s prove it.

Suppose Isner and Mahut play ten games, of which each wins five. By the rules of the game, each has five service games. If Isner wins exactly n of his five service games, then it follows that he loses 5 – n of his service games, which is equivalent to saying that Mahut wins 5 – n of his own return games. But since we know that Mahut wins 5 games altogether, then the number of his own service games that Mahut wins must be 5 – (5 – n). Which is n, so therefore Isner and Mahut must win the same number of service games.

From this, we can say that there are six mutually exclusive ways in which we can end up with a 5-5 score after ten games:

  • All Isner’s five service games go with serve, and all Mahut’s go with serve. The probability of this happening is:
         (0.825)^5 * (0.825)^5 = 0.14606
  • Exactly 4 of Isner’s five service games go with serve, and exactly 4 of Mahut’s go with serve. The probability of this happening is:
         (((0.825)^4 * (0.175)^1) * 5! / (4! * 1!)) ^ 2 = 0.16430
  • Exactly 3 of Isner’s five service games go with serve, and exactly 3 of Mahut’s go with serve. The probability of this happening is:
         (((0.825)^3 * (0.175)^2) * 5! / (3! * 2!)) ^ 2 = 0.02957
  • Exactly 2 of Isner’s five service games go with serve, and exactly 2 of Mahut’s go with serve.  The probability of this happening is:
         (((0.825)^2 * (0.175)^3) * 5! / (2! * 3!)) ^ 2 = 0.00133
  • Exactly 1 of Isner’s five service games go with serve, and exactly 1 of Mahut’s go with serve. The probability of this happening is:
         (((0.825)^1 * (0.175)^4) * 5! / (1! * 4!)) ^ 2 = 0.00001
  • None of Isner’s five service games go with serve, and none of Mahut’s go with serve. The probability of this happening is:
         (1-0.825)^5 * (1-0.825) ^5 = 0.00000

Adding the probabilities of each of these six possibilities, therefore, the probability of Isner and Mahut getting to 5-5 is about 0.3413. (I ran a simulation as a check, and got my computer to simulate 5000 sets with the same parameters. 1715 of them ended up 5-5, which is 34.30%, so I’m comfortable with the calculated probability above).

So, I expect 34.13% of sets played between Isner and Mamut to get to a score of 5-5. From this point, the first player to establish a two-game lead wins the set. How can I derive a probability distribution for this?

At first I thought that this was going to be a simple random walk problem. In this scenario, the starting point would be a score of 5-5, with each game won by Isner represented by a move one position to the (let’s say) right and each game won by Mamut represented by a move one position to the left. The set would be over as soon as a position two steps from the origin was reached.

However, the problem here is that the probability of moving in a given direction changes according to who’s serving. I thought for a while that I would have to return to first principles and derive a more complex model for a random walk.

Then I realised that I could make make the problem much simpler than that. Again starting at 5-5, note that the players have to play at least two games before the set can possibly be decided. The possible outcomes of those two games are:

  • Mamut wins both games, of which one is his service and the other is his return game. The probability of this is:
         0.825 * 0.175 = 0.144375
  • Isner wins both games, of which one is his service and the other is his return game. The probability of this is:
         0.825 * 0.175 = 0.144375
  • Both players hold their serve. The probability of this is:
         0.825 * 0.825 = 0.680625
  • Both players break their opponent’s serve. The probability of this is:
         0.175 * 0.175 = 0.030625

I’m using full precision on the numbers here, because these numbers are going to be raised to large powers later on, and even rounding to 4 decimal places makes a noticeable difference.

As a check, the total probability accounted for = 0.144375 + 0.144375 + 0.680625 + 0.030625 = 1.0000, which is as expected.

So after the two games, the score is either:

  • six games all, with probability
         0.680625 + 0.030625 = 0.71125

    or

  • one of the players has taken the set 7-5, with probability
         0.144375 + 0.144375 = 0.28875

But also, if the current score moves on to six games all, similar logic applies: after two further games, either the score moves to seven games all, with probability 0.71125, or one of the players takes the set 8-6, with probability 0.28875. And in fact the same logic can be applied whenever the two players have won the same number of games (as long as that number is greater than or equal to five).

So, for convenience, let’s assign the variable p as the probability that the set will end after the next two games:

     p = 0.28875

I can say from the earlier calculation that the probability of the set getting to 5-5 is 0.3413.

So the probability of the set ending 7-5 is:

     0.3413 * p

and the probability of the set getting to 6-6 is:

     0.3413 * (1–p)

Then I can say that the probability of the set ending 8-6 is:

     0.3413 * (1–p) * p

and the probability of the set getting to 7-7 is:

     0.3413 * (1–p) * (1–p)

And to take it one step further, the probability of the set ending 9-7 is:

     0.3413 * (1–p) * (1–p) * p

and the probability of the set getting to 8-8 is:

     0.3413 * (1–p) * (1–p) * (1–p)

A pattern is becoming clear now, and consequently it’s fairly easy to construct a general formula.

  • The probability of the set ending (n+2) games to n games is:
         0.3413 * (1–p)(n–5) * p , n>=5
  • The probability of the set getting to n games all is:
         0.3413 * (1–p)(n–5) , n>=5

At this point, it might make sense to test the formula, using historical data.

First, I’m going to guesstimate how often long sets actually occur in Grand Slam men’s singles tournament. (I’m going to exclude the ladies’ singles and all doubles tournaments, because the server advantage is higher in the mens’ singles game. It’s a slightly arbitrary exclusion, but I’m just trying to get a feeling for whether the statistical model is at all reasonable.) There are three annual Grand Slam tournaments that can produce long sets (the US Open uses a tie-breaker in the fifth set) and each of the three has 127 tennis matches, so in a year there are 381 matches in the mens’ tournaments.

However, a significant number of those 381 matches don’t get to a fifth set, and therefore can’t possibly produce a long set. In the last three applicable Grand Slam tournaments, the matches that did go to five sets totalled as follows:

Tournament Matches going to 5 sets Long 5th sets
Wimbledon 2009 28 10
Australian Open 2010 25 4
French Open 2010 22 6

Assuming this is typical, there are maybe 75 matches a year that might produce a long set. These three tournaments actually produced 20 long sets out of 75 matches, which is 26.67% – reassuringly not far from the predicted 34.15%. In fact, a slightly lower proportion than the predicted value would make sense, because the predicted value is based on a model where each player has the same probability of winning their service game. In a more general model, one player would usually have an advantage, leading to a greater likelihood of that player getting the necessary three sets earlier.

Now, remember the previous record holder for the longest fifth set in a Grand Slam tournament? It had 40 games, and therefore must have reached 19 games all. Plugging that number into the formula above, just for the sake of a sense-check (and accepting that I’m making a big assumption with regards to the the service-win probabilities, because I’ve used Isner and Mahut’s data), the probability of getting to at least 19-19 is

     0.3413 * (1 – 0.28875)(19-5) = 0.00289

or about 1 in 345. Given the above guesstimate of 75 matches a year going to five sets, this would suggest that the expected frequency of sets going to at least 40 games is somewhere in the region of once every 345/75 = 4.6 years.

Given that the match in question took place in 2003 and we’ve just seen another set reach at least 40 games (by which I mean Isner/Mahut), the calculated result at least looks reasonable. Also, as above, the model is predicated on two evenly-matched players, and given that most matches will feature two players not so evenly-matched, we might expect such sets to occur a bit less often – which is indeed what happened in reality. The model looks pretty good – or at any rate, not obviously wrong!

Here comes the final stage. I can now calculate the probability of equalling or exceeding the fifth set of Isner/Mahut, the condition for which is that the players get to 68 games all (after which the least that can happen is that the set ends 70-68). The probability is:

     0.3413 * (1 – 0.28875)(68-5) = 1.624 * 10-10

Which is somewhere in the region of 1 in 6 billion.

Hmm. That makes the Isner/Mahut match seem not just unlikely, but freakishly unlikely, if the calculations and assumptions are correct. If there are only 75 Grand Slam matches a year in which this is a possibility, we might expect a set as long as that played by Isner and Mahut to occur in a Grand Slam men’s singles match once every 82 million years or so.

Possible reasons for this include:

  • Pure chance: by definition, this explanation is extremely unlikely.
  • My model doesn’t hold up at the extremes, despite seeming reasonable under “normal”. This is not unusual for probability models.
  • Wimbledon is played on grass, which tends to favour the server or, to put it another way, makes it harder to break service.
  • The longer a set goes on, the more tired the players get, which again favours the server. The 167-game sequence without service breaks would support this and the previous point.
  • The players may have colluded to extend the match, for whatever reason. I don’t instinctively like this as a possible explanatory factor, and hesitated over including it here at all. I have no evidence that it might be true, apart from the analysis above. In the end I included it because whether I “instinctively like” it as an explanation is irrelevant: it is, I believe, plausible.

If you have comments, I’d love to hear them, particularly if you spot some faulty logic in the above, or if you think of a better way to built the probability model.