Thursday 30 January 2020

S9M25: Nerd Out

After a little FA Cup interlude, the Premier League returns to find the table even more uneven than before. Man City, all conquering heroes of yore are now closer to Sheffield United in 8th, than to Liverpool, only one place higher.

I've seen some interesting data visulisations over the years which show alternative takes on simple data - huge fan of all this stuff (check out r/dataisbeautiful if you like reddit things) but one of the most simple versions is a reworking of the PL table to distribute clubs by the size of the points gap. I'm not explaining that well so will post the table below, made without fancy stats packages so it's a bit basic:



Ideally that would be stacked vertically, but you get the picture. Liverpool are one place higher than Man City in the classic table, but in another atmospheric level. The title is done. The clustering from Man Utd in 5th to Newcastle in 14th, or even at a push to Watford in 19th is crazy. I'm not suggesting Watford will make a late run for the CL, but arguably anyone from Southampton downwards could get sucked into relegation fights with a run of poor form at this stage, whilst equally European football could be on the table for anyone up to Newcastle if they hit their strides.

This got me thinking, how good are Liverpool? 23 wins from 24 games is outrageous, but is it deserved? Have Man City et al underperformed more than Liverpool overperformed? To answer, I thought I'd look at the not-at-all-new but still quite controversial "x" family of stats

  • xG - Expected Goals - how many goals should a team have scored statistically based on shots taken (number, position on pitch) 
  • xGA- Expected Goals Against - how many goals should a team have conceded statistically based on shots taken (number, position on pitch)
  • xPts - Expected Points - how many points a team should be on for performances so far
Smarter modellers also take into account other factors, but the database I've used (Understat) doesn't define their exact criteria.

Obviously this is all sample size dependent, so overperfoming your xG for a game means nothing - but only the best clubs/players can outperform the stats for prolonged periods. 

It makes for interesting reading...


So let's describe this first. Column 2 is Goal Scored. Column 3 is Expected Goals scored. Column 4 is the difference. Red = more goals scored than expected. Green = Fewer goals scored than expected. 

Liverpool have outscored their expected figures by 5 goals, Spurs by nearly 6 and Leicester by 12 (!!!). Chelsea on the other hand have left 7 goals behind. Everton, Man United, Sheffield United and Brighton are similarly guilty whilst Watford should have 11 more goals as per the stats which would be game changing for them in terms of league position. Cojones eh, Mr Deeney?

What does this all mean? Well, not a lot. A hot streak for a few games may see reversion to the mean - Leicester look vulnerable to that, but there are lots of reasons for a xG differential, such as having a potato for a striker or an octopus for an opposing keeper. It would be reasonable however to suggest that Liverpool might be a few points ahead of where they should be based on nabbing a goal here or there in tight scorelines.

Now to xGA, which is the defensive version of the above:


Again, 2nd column is goals conceded, 3rd columns is the expected goals conceded and the 4th column as the differential between them. Green is over-performing and red has conceded more than expected.

Again, Liverpool are over-performing in this metric, along with Sheffield United, Newcastle and West Ham by ~8 goals and Palace by 9 goals. Chelsea, Wolves, Man United and most dramatically Southampton can feel slightly unlucky, and yet optimistic of improvement. Football being a low scoring sport means that a reversion to the mean here, combined with the point clustering seen above could have wild effects on the league table and subsequent European football, survival and prize money.

xGA can be affected by individual errors, having a bad keeper etc as well as coaching/individual brilliance. For example, Van Dijk & Allison are credited with transforming the Liverpool defence - one explanation for their xGA being in the green. Maguire's effect at Old Trafford has not been statistically born out, but could be confounded by de Gea not having his best season. Small margins.

Putting these tables together then, Liverpool are scoring more and conceding fewer than expected, whilst City are where they are - a few goals down, which their much discussed centre back issues could account for. Leicester, for the 2nd time in 5 years are hugely outperforming their xG and xGA, at the expense of Chelsea who are under in both counts. Managerial effect? Who knows? At the other end of the table, Watford should climb if they start scoring, but West Ham could be in trouble, Of course, a change of manager there could have huge effects on their prospects too,

To expected Points,


Now this is all a bit interesting. xPts comes from looking at each game and allocating the points according to how the result should have panned out, statistically speaking.

Liverpool are 20 points ahead of where they should be, statistically. speaking. This was surprising to me. Clearly, more games have been pretty close, meaning those xG & xGA over-performances clearly have delivered. Leicester, as expected have gained more points as have Palace and, worryingly for them, Newcastle.

Interestingly, Man Utd, who are an awful team when you watch them play, are 10 points down on their expected total. Hugely surprising that

So, a redo table based on expected points:


Places changed in the last column. Man City top by 5 points, Chelsea and Man Utd make up the top 4 and it's all much closer. Everton and Watford the big movers up the table. Newcastle and Palace are the big losers, closely followed by the respective North London dumpster fires.

So, in summary, Liverpool's dominance will be rewarded with a shiny thing, but the manner of their league title win should be considered in similar circumstances to Leicester of a few years ago. We should stop laughing at Chelsea & Man United who might one day click, and Nigel Pearson could well pull off the impossible dream.

And we've not even considered the VAR effect...


It seems a bit redundant to say this now, but let's get statty...

Matchday 23

This week, 22 people played
Most popular predicted result: Man City WIN (22/22)
Most disputed results: Watford vs Spurs & Norwich vs Bournmouth (6-5-11 & 11-6-5 split respectively)

Highest odds: Aron Kleiman (2197/1)
Lowest odds: Doron Salomon (776/1)
Average odds: 1189/1

Best predictor: AFM (5/10)
Worst predictor: Loads of us (1/10)
Average score: 2.55/10

Best predicted result: Liverpool WIN (21/22)
Worst predicted result: Newcastle WIN and Man City vs Crystal Palace DRAW (0/22)

Matchday 24

This week, 21 people played
Most popular predicted results: Leicester & Spurs WINS (21/21)
Most disputed result: Aston Villa vs Watford (7-7-7 split)

Highest odds:  AFM (3601/1)
Lowest odds: Andrew Feneley (375/1)
Average odds: 1274/1

Best predictor: Loads of you (6/10)
Worst predictor: WhoScored.com & Joe Machta (3/10)
Average score: 4.76/10

Best predicted result: Leicester & Spurs WINS (21/21)
Worst predicted result: Burnley WIN & Everton vs Newcastle DRAW (0/21)

Everyone's scores:



Leaderboard (>2/3 weeks, 17/24):



To this week's predos:

Good luck guys

2 comments:

Josh said...

Depressing statistical reading for Arsenal fans. We're actually outperforming ourselves statistically.... should we feel happy?

ccdaniels65 said...

We're not really. We're basically exactly where we should be (basically within a margin of error). The worrying trend if you go digging was how bad we were from Jan - Nov. No mitigation for keeping Emery that long. The xPts under Arteta has been higher than under Emery so hopefully we'll separate a little now