Monday, January 21, 2013

The Ultimate Statistic for Predicting Match Outcomes

I've been reading a lot of posts recently comparing two statistics, shots (S) and shots-on-target (SoT), for predicting match outcomes.
Teams who win the battle for Shots on Target show a significantly increased probability of winning...
Neither of these statistics, however, can hold a candle to the ultimate statistic, G. Forget S and SoT. It turns out that teams with higher G win the game 100% of the time!

Of course, G is goals. And I'm not telling you anything new by saying that the team who scores more goals always wins. That comes straight from the rule book.

Now, I bring this up not to make a dumb joke (well, not just to make a dumb joke) but rather to make a complaint about the comparison of shots versus shots-on-target.

The root of my complaint is this fact: every goal is a shot and a shot-on-target. We can think of shots-on-target as goals + blocks + saves. And we can think of shots-on-target as goals + blocks + saves + shots-off-target. Both of these statistics are goals + other-stuff.

If we compare these two statistics by checking which better correlates with winning, then we are essentially comparing which version of goals + other-stuff better correlates with just plain goals. (This is especially true if we compare ratios of these statistics to match outcomes, which seems to be the usual practice now.)

One way that we could see a better correlation is if one version of "other stuff" is better correlated with goals than the other. But another way to get a better correlation is for one version of "other stuff" to simply be smaller than the other. In particular, if both versions of "other stuff" are pure randomness, then whichever one is smaller will produce a better correlation. That is, goals + small randomness will correlate better with goals than goals + large randomness.

Should we be worried about that happening here? Yes, we should because shots-on-target is much smaller than shots (usually, 2-3 times smaller).

We can eliminate this concern, however, by removing "goals" from these formulas. That is, we can compare shots that are not goals to shots on target that are not goals to see which better correlates with winning the match.

I tested this using the matches from this season of the Barclay's Premier League (~450 matches so far). When we use the usual statistics, we find what others have claimed: shots-on-target has a higher correlation with winning. The correlation coefficient for shots-on-target is .476 versus .299 for shots, which is a huge difference. However, if we subtract out the goals from these, the correlation coefficient for shots-on-target is .117 versus .151 for shots. Now, shots becomes the better predictor. This is what we would suspect would happen if the advantage for shots-on-target was simply due to being smaller.

More importantly, this shows that including shots-off-target in our statistic increases the correlation with winning, at least once we remove from our statistic the very thing (goals) we were trying to predict. This only seems fair since, if we are allowed to include goals as part of our statistic to predict match outcomes, then we should just use the ultimate statistic: goals itself.

Afterward

The above analysis is not meant to prove that shots rather than shots-on-target is the better statistic for analyzing matches. Instead, the above analysis is meant to show that comparing these two statistics by which better correlates with winning is not sensible since one statistic can correlate better simply by being smaller.

(There are more sensible ways to compare these statistics, for example, by looking at which has less regression to the mean, amongst other things...)

Friday, June 1, 2012

Football Vandalism

Wikipedia defines vandalism as "ruthless destruction or spoiling of anything beautiful or venerable".

There are many motivations for vandalism: simple mischievousness, drawing attention to oneself, or even to draw attention to a political cause. Regardless of the merits of a political cause, though, vandalism is wrong because it destroys something beautiful, which causes harm to those who would have seen it.

There are always methods to draw attention to a cause without such collateral damage. That is why, ultimately, vandalism is an act of selfishness: by choosing this method, the vandal decides unilaterally that what is most important to him (a cause) outweighs the considerations of others (to see something beautiful). And whatever merits his cause may have, his selfishness must be condemned.

What does any of this have to do with football? This is what:

In case you didn't watch the match, it was a friendly between the USA and Brazil. Jermaine Jones put in a tough but fair tackle on Neymar, getting the ball, but then went through the player with his trailing leg. It appeared to be deliberate and Neymar hit the ground hard.

This tackle upset me and many others. Simply put, it was a dirty tackle. But even more upsetting was that the tackle was applauded by many USA fans. One proclaimed that he "loved it" and that this was the way to "send a message". Presumably, that message was that team USA is here to play hard.

I suppose I could complain about the message. What sense does it make to try to show the world how tough you are when it's a friendly... against an under 23 side?

But that's not the real problem. The real problem is that this is not how you send a message. You don't get people to appreciate your team by possibly injuring one of the best players in the world, a player who at age 20 is already mentioned in the same breath as Messi and Ronaldo. You don't get to potentially wreck his season or even his career because you think it's important to draw attention to yourself. That is vandalism. And ultimately, that is selfishness.

Thursday, March 15, 2012

How Fantasy Football Explains the Stock Market

Managing a fantasy football team is surprisingly similar to managing a stock portfolio. In both, you have a fixed budget to spend on a fixed number of stocks / players (say, 15). Your job is to use all of your knowledge in order to predict which stocks / players are going to have the highest returns, be they in the form of dollars or fantasy points.

One fact that I think would surprise people to learn is that most managers of stock portfolios do worse than the market average. Of course, if every manager simply picked 15 stocks from the S&P 500 at random, then about half would do worse than average every year. But in fact, the vast majority of managers — some years, more than 80% — do worse than average.

A year ago, the idea that most professional portfolio managers, people whose full-time job is managing a stock portfolio, could be worse at picking stocks than choosing randomly was inconceivable. However, after another painful year of playing fantasy football, this now makes sense.

To explain, let's consider a hypothetical stock called "Vermaelen Industries". Here is a chart of the price of a share of this stock over the year:


If I were to show this price chart to an investor and tell him that I owned this stock for much of the year, that investor would most likely think I did well. Indeed, Vermaelen Industries had a good year. It went up in price. If I had bought and held it for pretty much any 10 week period, I would have made money.

Of course, Vermaelen Industries is fictitious, but this price chart is not made up. I've translated the points per week delivered by Arsenal centerback, Thomas Vermaelen, into changes in price. When Vermaelen delivered more than an average number of points, his price went up, and when he delivered fewer, his price went down.

Here is a chart of Vermaelen's points per week (ignoring the periods when he was injured):


Show this chart to any fantasy football manager and they will instinctively know the truth: I bought in game week 16 and sold in game week 25. Or in stock market terms, I bought high and sold low.

All fantasy football managers are aware of the fact that players tend to do worse after you buy them and better after you sell them. The fancy name for this phenomenon is mean reversion, but terminology aside, it is an instinctive law of nature to fantasy football managers. Indeed, it's effect is so strong that I feel I can make my favorite team win by selling all of their players and buying players from their rival.

Strangely, this law does not seem instinctive to portfolio managers. Nonetheless, mean reversion is just as real for stocks as it is for football players. And that, I think, is the root cause of why most portfolio managers do worse than the market.

One might think that the tendency to buy high and sell low is a result of irrational thinking. However, I think not. Rather, this behavior is the natural result of rational thinking combined with constant attention.

To explain, suppose that I am choosing, for one of my 15 stocks, between Vermaelen Industries and Kompany Manufacturing. Suppose that I currently own Kompany, but every week, I look at Vermaelen's results and decide if I want to trade.

In order to compare these two stocks, I need to predict the returns of each stock in the future. The simplest way to do that is to look at the average returns of each stock over the year so far. (Other, more sophisticated methods would also give the same result, but let's keep things simple.)

Each week, I will look at Vermaelen's average performance over the year so far and compare that to Kompany's. If Vermaelen's is sufficiently higher, then I will trade. Once I own Vermaelen, I will continue to compare, and if his average performance becomes sufficiently lower than Kompany's, then I will trade back.

With this methodology in mind, it should be less surprising that I bought in week 16 because that was the exact moment at which Vermaelen's average performance so far was at it's highest point. Likewise, it should be less surprising that I sold in week 25 because that was the exact moment when his average performance reached it's lowest point. Indeed, both trades were perfectly rational given current information.

As I've described things, however, I would still have needed some bad luck in order to buy and sell as I did. In particular, to buy in week 16, it must have been the case that week 16 was the first week at which Vermaelen's average performance so far became sufficiently higher than Kompany's. If, on the other hand, it was sufficiently higher at week 15 or 14, then I would have got in on a couple of good weeks and the trade would not look so bad in retrospect.

Reality, however, is not this nice. In particular, I would not have normally had my eye on just Kompany and Vermaelen. More likely is that I owned Kompany and then, at some point, it came to my attention that Vermaelen might be a good buy. Then I crunched the numbers and realized I should trade.

Of course, then we must ask: why did Vermaelen come to my attention? Obviously, he came to my attention in week 16 because he had a couple of fantastic weeks just before then. In general, the most likely time that a player will come to my attention is just after they have peaked. That is the time at which the player's accomplishments are most likely to be talked about (in the media, on twitter, or wherever) and thus come to my attention.

If I watch ESPN Press Pass every Monday, then each week I'm likely to hear about the players who are performing better than usual. Many of these are likely to be players that are peaking. So if this is how I find out about players for potential trades, then I am likely to see considerable mean reversion.

For portfolio managers, there is a simple way to avoid this: don't watch the news every week (i.e. "buy and hold"). However, fantasy football managers have little choice. We have to set our lineups every week, which requires current information about every player. So unfortunately, I can say with some confidence that our suffering will long continue.

Sunday, February 5, 2012

On Diving

Another weekend of football comes with another set of questionable penalty decisions. In this post, I'd like to focus on two in particular: (1) the penalty awarded to Man City's Adam Johnson and (2) the penalty awarded to Man United's Danny Welbeck.

Much talk about the rightness or wrongness of these penalties is confused because there are many different questions we can ask. We can question the actions of the attacker or the decision of the referee or even the rules of the game themselves. There are many different questions we can ask, which have different answers. Here, I'll give my opinion on all of the questions I can see.

Was the attacker's action "wrong" (unsportsmanlike)? Yes, in both cases.

Both players went out of their way to make contact rather than trying to play the ball.

In Johnson's case, he kicked out his leg behind him hoping it would make contact with the defender, which it did. He took this action consciously, hoping for a penalty, rather than trying to play the ball. He clearly could have simply continued forward, going after the ball.

In Welbeck's case, rather than running toward the ball, he ran into the defender. The defender was not attempting to impede him. He purposefully decided to run into the defender and trip over him, while he clearly could have gone after the ball instead.

As a fan, I prefer it when players try to stay up even when there is contact. I prefer it when they try to avoid contact by running around or jumping over a leg that gets in their way. In short, I prefer players who act like Leo Messi. Unfortunately, Messi is a rarity in today's game in this aspect just as much as he is in terms of scoring and creating goals.

Was the attacker diving? No, in both cases.

Neither player "dived" in the sense that neither player simulated contact. Both players fell over because of real contact with the defender.

Was the attacker cheating? No, in both cases.

Neither attacker did anything that was against the rules or attempted to subvert the rules. As far as I'm aware, the rules do not award a yellow card for allowing yourself to be tripped.

Was the penalty decision correct? Johnson: questionable. Welbeck: more questionable.

I should first say: I am not a referee. Yes, I have read the laws of the game, but I have not spent hours studying them like religious texts, nor have I carefully read all of the guidance given to referees.

In real terms, however, both instances were of the type that you've seen sometimes given and other times not given.

If one wants to draw a distinction between the two cases, there is this: the defender in Johnson's case had stuck a leg out. In Welbeck's case, the defender (Ivanovic) had both feet firmly planted. While I wouldn't have had a complaint if Johnson's penalty had not been given, Welbeck's feels bad because it's hard to imagine what Ivanovic could have done differently. He defended well, went for the ball cleanly, and even appeared to be trying to stop himself from getting into the path of Welbeck. At the point of contact, he was not even in the path between where Welbeck was and the ball. (He was in the path of Welbeck's motion, but since the ball had moved away, Welbeck could have and should have turned as well.)

Should penalties be given in these situations? No, in both cases.

As I said above, I wish more players acted like Leo Messi, jumping over defender's legs and running around them instead of falling over at any contact. Unfortunately, the rules make this a bad decision for most players. Unlike Messi, most players are more likely to create a goal scoring opportunity by falling over than they are by continuing to play.

This needs to change. One idea is to allow referees to award yellow cards in situations like those of Johnson and Welbeck, where the player did not "dive" but did "go looking for it", trying to get contact with a defender instead of trying to play the ball. If such decisions could be made correctly, the game would be better for it.

That said, if the last few weeks are any evidence, the current laws already ask too much of the referees. So perhaps these changes would just give them more calls to get wrong....

Saturday, December 3, 2011

Nasri and Silva

In today's match between Manchester City and Norwich City, one aspect that caught my attention was the influence of Samir Nasri. For only the third time this season, Nasri completed more passes than David Silva in a match when both started. One of those was the very first match that Nasri played for City in the league. Since then, Silva has been the dominant creative influence for City in every match save one, when Silva was man-marked for most of the match against Everton.

In Nasri's first four matches, both he and Silva played all over the midfield, in some sense playing the same position. Against Spurs, this seemed to work well, with both players completing over 60 passes. Against Wigan, though, it did not seem to work, as Nasri had little influence, completing half as many passes. It worked a bit better against Fulham, but over all of those four matches, the record is not great.

Starting with the match at Blackburn, the tactics changed somewhat: Nasri played more out wide. He was mostly left against Blackburn and Liverpool, and mostly right against Wolves and United, but all the while, he was playing mostly to one side or the other

Over that same period, there were three matches (Aston Villa, QPR, and Newcastle) in which only one of the two started.

At the end of this period, I had the impression that the two did not play especially well together. If anything, it seemed as though Milner playing in a wide role was able to link up better with Silva.

However, today's match showed yet another approach. Nasri played centrally, rather than wide, but unlike those early matches, he played slightly deeper than Silva. These Guardian chalkboard shows the subtle difference in their passing:



In some sense, it may be fairer to say that, rather than Nasri playing deeper, Silva played further forward. Most of his passing in previous matches looks much like Nasri's in this one, whereas in this match, he stayed closer to the strikers.

One should keep in mind that Silva plays further forward for the Spanish national team. And while he does drop deep to get the ball for City, he often rushes back, and I often get the impression that he wants to be closer to goal, able to link up with the strikers to create goals or score them himself.

Nasri playing deeper is also interesting for another reason. As Michael Cox of zonalmarking.net has suggested, the one thing that City may be lacking, compared to the top sides in Europe, is a deep-lying playmaker. Someone like a Xabi Alonso, say, who can start attacks and make exceptionally accurate long passes. Such players are hence able score from free kicks.

Oh, did I mention that Nasri scored from a free kick today?

As I always try to remind myself, this was only one match. Nasri's deeper positioning in this match may have been a one-off. But it also may be a glimpse of the future. Time will tell.

Monday, November 28, 2011

Cheaty Divers

Based on last weekend's match between Liverpool and Man City, one would have to say that Luis Suarez's reputation for diving is well deserved. He appears to hit the ground in response to any attempted tackle by opponents, successful or not. At one point, during this match, he actually went studs up into the defender (Vincent Kompany) and then fell on the ground as if he had been fouled!

The individual efforts of Suarez probably explain much of the difference in fouls received in the match (11 v 15). However, there were other notable refereeing decisions that went Liverpools way. In particular, goalkeeper Reina not receiving a card for deliberately handling outside his penalty box and Mario Balotelli's second yellow card. A few in the media have agreed with the latter decision; however, on average, those in the media have suggested it was harsh. All together, there were plenty of reasons for Man City fans to feel aggrieved.

That said, it seems unlikely that any of this, aside from Reina receiving a card, would have changed the outcome of the match. However, deliberate attempts to draw cards either by diving (Suarez) or surrounding the referee (the rest of the Liverpool squad) is not what any fan wants to see. As Roberto Mancini said after the match, "this is not football."

I was left wondering whether this was a one-off for Liverpool or rather a general strategy. In other words, is diving and surrounding the referee now part of the Liverpool way?

To try to understand this, I analyzed fouls in matches using the same technique as I would model goals. Each team gets an "attack" score, which measures how much they foul, and a "defense" score, which measures how much they draw fouls for the other team.*

What do the statistics tell us then?

Broadly, they show that most teams foul at about the same rate. There are a few teams that foul less than usual: Swansea, Norwich, and Man United. Swansea, in particular, foul very little, while the other two foul only a tiny bit less than usual. At the other end, Blackburn and Wigan foul a bit more than usual.

That describes how much each team fouls. The more interesting part is how much each team draws fouls from their opponent.

First off, the statistics do not show that Liverpool are cheaty divers. While they may have displayed some unsporting behavior in this match versus Man City, the model does not suggest they draw fouls any more than other teams in general.

However, some teams do appear to draw fouls. Chelsea, Wolves, and Newcastle draw a few. But the leader in this department, by a huge margin, are Queen's Park Rangers. The amount of fouls they draw is about the same as the difference in fouls between Swansea and an average team, which is quite big. In other words, when Swansea plays against QPR, we would expect them to receive as many fouls as an average team. An average team, on the other hand, should receive quite a few more fouls than usual when playing against QPR.

There is one more possibility afforded by the model, which we have not yet discussed. It's possible for a team not to draw fouls from their opponent but somehow to suppress them. In other words, are there teams whose opponents systematically receive fewer fouls than they should?

It turns out this does indeed happen to one team: Man City. Somehow, when teams play Man City, either they choose not to foul as much as usual or the referee chooses not to call them as often as usual. Furthermore, the effect is not small. It's the third largest effect in the model after Swansea's not fouling and QPR's drawing fouls.

So are teams especially timid when playing against Man City or are the referees biased against them? I'll leave that to you to decide.

(*) As usual, I "regularized" my model, which means I only allowed new variables if the improvement in fit outweighs the extra complexity they add to the model. I.e., Occam's razor was applied.

Wednesday, November 23, 2011

4-4-2 versus 4-2-3-1

We have previously discussed on the fact that no top team plays a traditional 4-4-2 today and that the current version of the 4-4-2 played by, say Manchester United, is not especially different from a 4-2-3-1. Zonal Marking, in his review of Napoli v Man City, strikes the same note:
Of course, when you play with two attack-minded wide players plus one striker dropping off into the hole, 4-4-2 and 4-2-3-1 are, if not interchangeable, not significantly different.