Thursday, March 31, 2011

EPL Week 31 Fantasy Team

I had a disappointing middle-of-the-pack finish last week. However, it was a poor week for everyone with few points to be had anywhere. The top 9 spots were only separated by only 5 points, which means the final ordering was pretty much down to chance.

Logging into my account to set this week's lineup brought a smile back to my face though. Despite last week's poor performance, I'm still up about 75% over the last 10 weeks. So there's plenty of motivation going into this weekend.

According to my models, these are the teams to look at for this weekend:

  • Offense: Arsenal, Newcastle, Everton, Liverpool
  • Defense: Man City, Bolton, Tottenham, Chelsea

Of these, Arsenal is far and away the most likely to score. The others on offense and all of those on defense are clustered pretty close together. Arsenal also has a reasonably good chance to get a clean sheet; however, given how far they are ahead on likelihood to score, it makes the most sense to draw two attacking players from Arsenal.

The midfield looks to have the fewest choices, so we will start there. Arsenal and Newcastle both have good options, but neither of the others do. Everton does not have any high point earners in midfield. Liverpool's best choice is Gerrard, who is injured. Meireles had a few weeks of great form, but with Suarez and Carroll in the lineup now, he hasn't been earning many points.

This means we want to choose two midfielders from either Arsenal or Newcastle and one midfielder from the other. Given that we would never want to skip van Persie, it seems best to take two from Newcastle, and the obvious choices are Barton and Nolan. From Arsenal, we can take either Nasri or Fabregas. The latter is expected to be ready this weekend, but there is still some risk that he would start on the bench. So for that reason, I picked Nasri.

For strikers, we clearly want to pick van Persie. We've already taken two from Newcastle, so Best is not an option. From Everton, Cahill is the only option as Saha is again injured. We can either pick him and one from Liverpool or take two from Liverpool. I choose to do the latter because it's too hard to choose between Suarez and Carroll. The former has been in great form for Liverpool recently, while the latter just scored for England this week and should have high confidence.

For defense, the best options from Man City are Kompany and Hart. Injuries leave fewer choices at City at the moment. Zabaletta should be back, but we can't expect him to start in top form. Hence, taking Hart seems like a safe bet. The remaining three choices of defenders are not easy. Chelsea defenders are usually good choices, but they are playing away to Stoke, who can be very effective at home. I chose to hedge my bets by taking just one Chelsea defender, Luiz, who is also a scoring threat, along with one defender from each of the other teams with likely clean sheets.

This gives us the following 4-3-3:

  • Strikers: van Persie (ARS), Carroll (LIV), Suarez (LIV)
  • Midfield: Nasri (ARS), Barton (NEW), Nolan (NEW)
  • Defenders: Kompany (MAC), Cahill (BOL), Dawson (TOT), Luiz (CHE)
  • Keeper: Hart (MAC)

Finally, for captain this week, the choice is easy: van Persie. He is the most likely player on Arsenal to score and Arsenal are the most likely team to score.

Monday, March 21, 2011

Mancini's Project: An Arsenal With Muscle

Changes at City

If you listen to the journalists, nothing much is changing at Manchester City. An Italian manager, like Mancini, always means catenaccio, ultra-defensive football. It is always cautious, striking only on the counter, never venturing forward in numbers. It means grinding out boring, 1-0 victories.

However, if you listen to those who watch City day in and day out — the fans — you'll hear a very different story. Mancini is trying something new. There is less emphasis on counter-attack. Instead, it is slow build-up. There are fewer crosses into the box. Instead, there is more link-up play through the middle.

Indeed, fans don't argue about whether or not Mancini is changing Manchester City's tactics. They argue about whether he should be doing so. Some argue, instead, that we should stick with the familiar: good ol' 4-4-2. Or if not that formation, then at least counter-attacking football. And why not? This English brand of football has served well the likes of Manchester United, Chelsea, and in recent times Tottenham against the best international competition.

Something v Nothing

It's never a fair fight, though, between something and nothing. Even if that something has its problems, we prefer the devil we know to the devil we don't.

Thus, before we can fairly cast judgement on Mancini's project, we need to have a clear idea in our heads of what that project is. But since Mancini has never really spelled it out for us, we are left to guess. So in the remainder of this post, I'll try to spell out my (uneducated) guess of what the future holds.

A Point of Reference

As someone once said to me, "Mancini loves his 4-2-3-1". Despite the dominance of this formation internationally, there is only one other top 5 team in the English Premier League that uses a 4-2-3-1 as its standard formation. So it is perhaps easiest to describe Mancini's tactics by looking at how they compare to the other team that plays the same formation: Arsenal.

As a quick refresher, the 4-2-3-1 system has the following parts: 4 defenders (two fullbacks on the wings and two centerbacks in the middle), 2 defensive midfielders, 3 attacking midfielders (two on the wings and one central), and 1 striker.

In both Manchester City and Arsenal's systems, there are 5 or 6 attacking players when in possession. The "3-1" players are always attacking, while half of the "4-2" players, the two centerbacks and one defensive midfielder, are always defending. The differences lie mostly how the fullbacks and the defensive midfielders are used during attacks.

In Arsenal's system, these four players take turns making attacking runs. In Manchester City's system, the defensive midfielders usually stay defending, but on the other hand, the fullbacks are usually more attacking. Micah Richards and Aleksandar Kolarov are more often marauding down the wings than they are sitting back.

Despite this minor difference, the two teams attack in roughly equal numbers: usually 5 or 6 players attacking and 4 or 5 players sitting back when in possession.

The similarities between the two systems do not stop at the formation though. Amongst the individual players, there are strong similarities.

Both teams are centered around a Spanish attacking midfielder known for his creative link-up play. Arsenal has Cesc Fabregas, while Manchester City has David Silva. While journalists have not yet begun to heap praise on Silva the way they way they do on Fabregas, his central importance to the team is common knowledge amongst fans. Silva was voted Etihad player of the month three times in a row this year. When City lost a match while Silva was injured, fans were quick to bemoan his absence. "There was no creativity. The team is lost without Silva," they said.

Both teams feature wide attacking players that attack by cutting inside or delivering passes rather than crosses into the box. For Arsenal, these are Nasri, Walcott, and Arshavin. For Manchester City, they are Tevez, Balotelli, and Adam Johnson. Both teams have pace (especially Walcott and Balotelli), but for neither team is that their primary attacking weapon.

Both teams play with a lone central striker who is tall and can head the ball well but is primarily known for having excellent technical ability, a silky touch, and scoring with both feet. These are Robin van Persie, for Arsenal, and Edin Dzeko, for Manchester City. Of course, Dzeko has not performed like van Persie so far in his eight starts in the English Premier League, but he demonstrated the same type of skill while at Wolfsburg.

Mancini's Defense

The differences between Arsenal and Manchester City are most apparent in defense. The core of Manchester City's defense — de Jong, Kompany, Lescott, and Hart — is clearly superior to Arsenal's — Song, Koscielny, Djourou, and Szczesny. And of course, it is certainly true that Mancini cares more about defense than Wenger. He expects more of his players to track back when not in possession in order to prevent the other side from scoring.

Of course, sending more players back does mean they have further to run when they gain possession, so it does in principle mean that the side is less attacking than otherwise. But note that this is less true when the team focuses on slow build-up rather than counter-attack, as Arsenal and Manchester City do. So in practice, it is not the case that an Arsenal-like offense cannot also have good defense.

An Arsenal With Muscle

An emphasis on defense is not the only difference, however. I would argue that Manchester City are a more muscular team in general. Wenger's players are called "butterflies", but clearly that description does not apply to Richards, Kompany, de Jong, Balotelli, and Dzeko.

As Gabriele Marcotti has reported, Mancini's training includes a strenuous strength training program designed to build muscle in his players. Clearly, his system will feature not only skillful link-up play but also players with the strength to fend off defenders, speed to break away, and fitness to relentlessly press for 90 minutes.

This "physicality" is the attribute most commonly used to describe English football. Indeed, the clear advantage in strength, speed, and fitness is my explanation for why English teams do so well in the UEFA champions league. Mancini seems to recognize this. The best team would not shun physicality for the tactical Italian approach but would rather have both.

My guess is that this is what Mancini hopes to achieve. He wants to wed the most progressive attacking approach in the English Premier League with the rugged physicality that gives every English team an edge in Europe. In short, he wants an Arsenal with muscle.

Patience

Perhaps it is natural that journalists would see Mancini's approach as being mostly defensive since that is the area where improvement becomes visible most quickly. Good defense requires proper organization, which a coach can quickly teach and drill into his players. Indeed, we have seen the same defensive turn-around more recently at West Brom under new coach, Roy Hodgson.

Offense seems to take more time to improve. It requires creativity, which cannot be drilled into players. And it requires an understanding between players, an intuition for what the other players will do, that can only be gained over many matches.

Manchester City have had little time to for that so far. Dzeko has only played eight Premier League matches. And very few of those have featured the same set of attacking players, due to injuries and suspensions.

Of course, Dzeko is also adjusting to a new league. And he is adjusting to new training methods. In particular, Mancini's strength training program is asking more of him than ever before. Silva had to go through a similar adjustment in coming to City this summer. As we saw, it took over 20 matches before he started to show the brilliance that fans are now familiar with.

Dzeko will need more time to adjust. And it will take time for Silva and Dzeko to develop a partnership anything like what Fabregas and van Persie have achieved.

Mancini's project is undoubtably ambitious. But he is on track this season to finish in the top 4. That should buy him time to continue its development. The team will need an offseason to train together and many matches with the same starting eleven running onto the pitch before they can become a cohesive unit capable of challenging the best.

But ultimately, I think that is what Mancini is trying to create at City. He is thinking not just in terms of winning league titles. He has a vision of a team that wins trophies in Europe. If that vision is anything like what I am imagining, then it is a vision too splendid to give up on quickly.

Friday, March 18, 2011

EPL Week 30 Fantasy Team

Here are the teams to look at according to my models:

  • Offense: Arsenal, {Blackburn, Tottenham}
  • Defense: Birmingham, {Chelsea, Everton}, Stoke
  • Both: Man United, Aston Villa

The bookies' model suggests the same choices in terms of offense, but in terms of defense, they prefer Arsenal, Tottenham, Liverpool, and Blackburn to Everton, Stoke, and Birmingham. As usual I will stick with my model, as mine is more accurate at predicting scores (and hence, clean sheets) than theirs.

Looking at options for strikers from amongst these teams, we find Rooney, van Persie, Arshavin, and Bent. Neither Arshavin nor Bent is a particularly great choice, however. Furthermore, Arsenal have better choices in midfield than Arshavin at striker, and many teams offer better choices at midfield than Bent.

In fact, with just two good choices for strikers and a great many choices in midfield (as we shall see), it seems sensible to use a 3-5-2 formation, rather than the 4-3-3 that I have favored recently.

Choices in midfield include van der Vaart, Nasri (note that Fabregas is out), Stewart Downing, Ashley Young, Gareth Bale, and Morten Pedersen from Blackburn. The first four of those are easy picks, especially given that Aston Villa are at home facing Wolves, who are a poor away team. The last choice is between Bale and Pedersen. I went with Pedersen since he is the only consistent point source from Blackburn and they are sure to score against Blackpool.

In defense, we also have many choices: Terry, Ivanovic, Ashely Cole, Huth, Evra, Luiz, Baines, and Ridgewell. The first three of these are all from Chelsea, so we can take only two of them. I picked Cole over Ivanovic because Man City city are pretty dangerous in the middle. Huth is the next best choice for the third and final spot.

Finally at keeper, the only good choices are Cech and van der Sar. Thankfully, we still have not picked a second from Man United, so we can take van der Sar.

Thus, we arrive at the following 3-5-2:

  • Strikers: Rooney (MAU), van Persie (MAU)
  • Midfield: van der Vaart (TOT), Nasri (ARS), Downing (AST), Young (AST), Pedersen (BLR)
  • Defense: Terry (CHE), Cole (CHE), Huth (STO)
  • Keeper: van der Sar (MAU)

The choice of captain is particularly difficult. Good choices include van Persie, who is facing leaky West Brom; van der Vaart, who is facing leaky West Ham; and Rooney, who is in good form at the moment. Of these, van Persie is probably the safest choice, particularly given that Hernandez is also a scoring threat for Man United, and they are facing Bolton, who are a strong team.

Sunday, March 13, 2011

Lucky and Unlucky Teams in the EPL

Not only can you use mathematical models to predict the outcome of matches, you can also use them to look backward and ask "who should have won?"

Of course, the team that "should win" doesn't always do so, and thank God for that. The game would be nothing without the chance for an upset.

Nonetheless, this sort of analysis can tell us useful things.

There is all sorts of randomness that affects the outcome of a match. The decisions of the referee have been foremost amongst these in recent news. However, there are many other sources of randomness. As the season progresses, though, this sort of randomness tends to cancel out. After 30 matches, the average amount of luckiness or unluckiness that each team gets per game becomes close to zero.

Hence, if we look back and find that a team appears to have been consistently luckier or unluckier, then that is evidence that something else is going on. (Perhaps another variable that should be added to our model.)

With that in mind, here are the most lucky and unlucky teams so far this season:

Unlucky

  1. Newcastle United: currently in 10th place but should be in 5th.

  2. Wolverhampton Wanders: currently in 19th place but should be in 14th.

  3. Birmingham City: currently in 17th place but should be in 13th.


Lucky

  1. Tottenham Hotspur: currently in 5th place but should be in 10th.


Finding an explanation for these is left as an exercise for the reader.

How Do You Stop Barcelona?

The short answer is: you can't. They almost always score, and they rarely ever let you score against them. That's a simple formula for winning.

The long answer is: you can affect how many they score. And if they score fewer, then you've got a better chance.

I started looking into this after Manchester City's match today against Reading. City had 17 shots (10 on target) but scored only one goal. Calls rang out that City "aren't good enough", but they rang hollow in my ears because I've seen the same thing happen to Barcelona many times, and Barcelona are the very definition of a good team. Indeed, it seems like many shots and few goals often go together.

The statistics bear out what I had suspected. Looking at Barcelona's last 10 games, we find that the number of shots taken is negatively correlated with the number of goals. That is, the more shots taken (in general), the fewer goals scored. Shots on target are also negatively correlated.

On its face, this seems odd. The more shots you get, the more goals you should score, right? This can fail to be the case, though, if both the number of shots and number of goals are both correlated with some other variable, but in different directions (one positively and one negatively).

What could that third variable be? The simple hypothesis is this variable is the tactics of the defense. In other words, if the defense takes one action, then shots go down but goals go up, while if they take a different action, then shots go up but goals go down. And as a result, it appears that taking more shots produces fewer goals.

Although there are many different defensive strategies, we can do a simple analysis by focusing on just two of them:

  • Mourinho approach (a.k.a. "parking the bus"): a narrow, compact defense at the front of the penalty box.

  • Wenger approach: a wide, compact defense moved high up the pitch.

Both approaches put 10 or 11 men behind the ball at all times and attack only on the counter.

The key difference between the two is in the positioning of the back line. With Mourinho's approach, you "invite pressure", facing wave after wave of attacks. However, with so many men right in front of the box, it is very hard for even Barcelona to find a way through. With this approach, they are certain to get many shots though.

With Wenger's approach, you keep Barcelona far enough away from your goal that they cannot shoot. Now, Barcelona must attack differently, by trying to break your offside trap. This is even harder to do, but when the offside trap is broken, it gives Barcelona a 1-on-1 chance against the goalkeeper. In other words, they will get fewer shots, but these shots will be more dangerous.

To determine which of these approaches allows fewer goals, though, we must look at the data. To do this, I used a simple measure: the average position of the center backs, be it 0, 10, or 20 yards in front of the penalty box.

Even though this is a fairly noisy measure and not a lot of data, we can see that it looks sensible, qualitatively. Centerback position is negatively correlated with shots on goal. In other words, the higher the back line, the fewer the shots allowed.

The important statistic, though, is how this relates to goals. And here we find a positive correlation. In other words, the higher the back line the more goals scored by Barcelona. This shows that Mourinho's approach is generally more effective.

It also means that Manchester City fans need not fret about scoring only 1 goal with 17 shots. It doesn't necessarily mean that the team is "not clinical enough". It may simply mean that the opponent used a tactic that invites shots but makes goals harder.

Note: This does not say whether Mourinho's or Wenger's approach is the better tactic for winning because we did not analyze how the position of the back line affects your own chance of scoring. What this does say is that Mourinho's approach makes Barcelona less likely to score against you.

Thursday, March 3, 2011

The Trouble With Statistics

It is often said that statistics can be made to prove anything. In the words of Gregg Easterbrook: "torture numbers, and they'll confess to anything."

This highlights what is probably the most dangerous aspect of working with statistics. Suppose you want to know who will win an upcoming match, say, Blackburn v Chelsea. One person says Chelsea are the better team and will clearly win. Another person says, no, Blackburn are good at home and Chelsea are poor away, so Blackburn wins. Yet another person says that Zhirkov is back for Chelsea and their away record is much better when he's in the lineup. Then someone else points out that that Chelsea just had a Champions League match, and their record after such midweek matches is... And on and on. Pick whichever outcome you want, and there are some statistics that will "prove" you are right.

This anecdotal problem is well known to statisticians as overfitting. If you look at too many different variables, you will find some that appear to predict the outcome purely by coincidence. Indeed, even in completely random data, some patterns are bound to appear.

What may be news to non-statisticians, however, is that there is also a well known solution to this problem: regularization. Rather than comparing possible explanations simply by how well they fit the data, you introduce a "cost" for each variable that is used in the explanation. The best explanation will strike the optimal balance between the quality of the fit and the simplicity of the explanation. In essence, regularization is just the application of Occam's razor.

This raises another question, however: how much should each variable cost? We can determine the right cost by looking at last years' data. For example, before week 29 in the Premier League, we find the best explanations for the first 28 weeks of last year's data using various different costs per variable. Then we compare those explanations based on how well they predict the results of the remaining 10 weeks' matches. (For statistics geeks, this technique is called "cross-validation".)

It stands to reason that the cost that works best last year will also work best this year. The reason is that the ideal cost should simply be a function of how much data we have. It is much easier to get accidental correlations over 5 weeks of data than it is over 25 weeks (for the same reason that it is easier to get 5 heads in a row than 25 heads). By doing the experiment with last years data, we can figure out what cost is best with 28 weeks of data and then use it again this year.

Regularization makes playing with statistics much, much safer. We can introduce all of the variables that our friends tell us will predict the result: league record, home record, whether Zhirkov is playing, whether they just had a mid-week match, and anything else you can think of. We add a cost per variable, and the best explanation will include those variables that matter and exclude those that do not.

Tuesday, March 1, 2011

EPL Week 29 Fantasy Team

Here are the teams to focus on this week, according to my models:

  • Attack: {Chelsea, Bolton}, {Fulham, Arsenal}, Newcastle
  • Defense: Man City, {Fulham, Arsenal}, Birmingham, Chelsea

I've grouped those that are roughly equivalent, while the others are in order with best on the left.

In fact, Man City is by far the most likely clean sheet, so we can start there. Given the large number of injuries, Zabaleta may be the only safe choice amongst defenders. But we can choose another defensive player form City by picking the keeper, Joe Hart, who is also a safe choice.

Switching over to attack, we certainly want to take a Chelsea attacker given that they are facing Blackpool. I picked Torres because he is almost certain to start, and against this defense, I think he will finally score. However, Anelka would be another reasonable option. From Bolton, Sturridge is in incredible form, while Elmander has been the opposite, so I will pick the former. Fulham has no good choices. Newcastle's only choice is Best, but he has not delivered recently. Arsenal is also racked by injuries, which limits the choices to just Arshavin or Bendtner. I went with the latter as he's been in good form recently, although the former may be a safer choice.

In midfield, we have many teams to consider but surprisingly few good choices. From Chelsea, we could take Malouda or Lampard. Nothing from Bolton. Dempsey is the only good choice from Fulham. Arsenal has only Nasri. Newcastle has only Nolan since Barton is still struggling with injury. With only one more choice from Chelsea and Arsenal, it makes the most sense, I think, to take a Chelsea attacker given their opposition, and save the last Arsenal choice for a defender. Malouda's form has not been great recently, so I picked Lampard, along with Dempsey and Nolan.

Leaving aside Chelsea and Man City, both of whom we have used up already, we have only a three teams to consider for defenders. Arsenal's only good choice is Djourou, but he has been as good as Chelsea's Ivanovic, in terms of points. Fulham has Baird and Hangeland. Birmingham has Ridgewell. The only decision is which Fulham defender to choose, and I picked Hangeland as he was performing very well not long ago.

Thus we arrive at the following following 4-3-3:

  • Attackers: Torres (CHE), Sturridge (BOL), Bendtner (ARS)
  • Midfield: Lampard (CHE), Dempsey (FUL), Nolan (NEW)
  • Defenders: Zabaleta (MAC), Hangeland (FUL), Djourou (ARS), Ridgewell (BIR)
  • Keeper: Hart (MAC)

The final choice is whom to pick for captain. The most sensible plan is to take a Chelsea attacker, which in my case means Torres.