Tuesday, August 9, 2016

The Distribution of Olympic Talent

With the 2016 summer Olympics in full swing I was disappointed with the lack of athletes from my own country, Pakistan, that I could support. With only seven athletes representing the sixth most populous nation in the world I became curious about the disproportionate, either under or over, representation of other countries in Rio. Do Chinese and Indian athletes make up over 35% of all Olympians this year? Similarly, is the Jamaican athletic delegation reflective of it's around 0.4 percent population share? If not then why? This post addresses these questions.

Using data from the CIA World Factbook and the IOC I obtain measures of global population shares and athlete shares in Rio for each country 1. Using these shares I consider the log difference 2 between the athlete and population shares. So, for example, with around 3 million inhabitants, Mongolia accounts for about 0.04 percent of the global population. On the other hand, the 43 Mongolian athletes account for 0.38 percent of all Olympians this year. So Mongolians are over-represented at this year's Olympics with a large and positive log difference between these two shares. In contrast, Nigerians are under-represented with 2.5 and 0.7 percent population and athlete share respectively.



The above map illustrates the disproportionate representation of each country's Olympic delegation. Countries in green are over-represented and countries in purple are under-represented. Darker shades represent more uneven distributions of athletes and population. The map clearly shows that much of Asia and Africa is under-represented, while much of Europe, Oceania and the Americas are over-represented.

Eyeballing this map suggests that poorer nations tend to be under-represented relative to wealthier nations.This should not surprise most readers since, while it may be true that innate talent is independently and identically distributed (i.i.d) across borders, the resources needed to develop this talent into Olympic athletes is certainly not. The obvious, and most easily measured, of these resources is monetary input. After all, it is expensive to build athletic facilities, pay coaches and travel to international competitions. However, this is far from the only input required to create athletes. Nations need to encourage innate talent and create an environment where sports and athleticism is rewarded either socially or financially. I think of this residual resource as the cultural input necessary for the development of Olympic talent. 

Although culture is hard to quantify, it is possible to obtain a theoretical estimate by controlling for a measure of monetary input. Indeed, by controlling for GDP per capita on our measure of over/under representation we can interpret the residual/unexplained component of representation as being attributed to cultural differences in the perception of athletics. Those countries with the highest(lowest) residual component can then be interpreted to have high(low) levels of cultural input.

The figure below plots the log-differences in athletic and population shares against log GDP per capita for each competing nation. I also include a fitted line obtained from regressing the log differences on log GDP per capita. It is clear that the relationship between these two measures is positive. This simply reinforces the fact that spending on athletics is a normal good






In addition, the figure identifies those countries with highest and lowest 5th percentile residuals from this regression. These are those nations that are furthest, either above or below, from the trend line. Consider the example of China; Chinese athletes are under-represented at this year's Olympics. After controlling for GDP per capita they are still under-represented relative to economies with similar levels of GDP per capita. This unexplained deviation from the trend can be thought of as a low cultural input in China 3. Conversely, athletes from the Central African Republic are over-represented even after controlling for GDP per capita; a high cultural input. These countries can be thought of as outliers.

The table details the the 5 countries with the lowest and highest cultural input into developing Olympic athletes. The bottom four countries share the common trait that they are all majority Muslim. While much progress is being made in this front, traditionally female athletics in Muslim countries is not always actively pursued closing the door to a large pool of potential Olympians which could explain their low ranking.  Furthermore India, the fifth lowest country, is culturally very similar to both Pakistan and Bangladesh suggesting that these three former British colonies share a common attitude towards Olympic sports. 


On the other end, there is no such common trait among the top five countries. Instead each appears to be a specialist in a given set of Olympic sports with a long tradition of producing Olympians. Jamaica dominates track events. New Zealand is included in this list since they compete in team sports (i.e. rugby and soccer) which increases their athlete share. A quick googling suggests that Armenia and Estonia are specialists in weightlifting, wrestling and athletics.

I began looking into this data to understand if Pakistan was exceptional in it's under-representation at this year's Olympics. While Pakistan does rank as having the lowest level of cultural resources it has the company of a few other Muslim nations. We should not aim to be the next Jamaica or Armenia but we should aim to move closer to the trend line and not by lowering our GDP per capita.



1. I only include those countries that have an Olympic delegation. I exclude those countries that have fewer than 1 million inhabitants. The Independent and Refugee Olympic Athletes are also excluded in this analysis. 
2. I use this as an approximation for percentage differences between athlete and population shares.
3. While there are alternative measures of the monetary resources available to develop athletic talent, I think GDP per capita suffices for this blog post.

Saturday, August 6, 2016

Have Pakistani Test Openers Always Struggled?

Pakistan's recent loss in Manchester highlighted a number of the team's shortcomings. From dropped catches by the Pakistani side to a high English run rate the differences between the two teams was clear. For me, the most striking difference was in the performance of the opening batsmen. With England's opening pair, Cook and Hales, scoring a combined total of 316 runs compared with Hafeez and Masood's 171 in the first two tests of the Investec series, our openers have struggled to produce runs. Frustrated by this trait in many recent Pakistani test squads, in this article I analyze the long term trend performance of Pakistani openers and find that, since the combination of Aamir Sohail and Saeed Anwar, they have experienced a steady decline in their run scoring ability while most of their opponents have significantly improved theirs 1.

Figure 1: Average Runs Scored by First or Second Batsman in Test Cricket
Using data from Cricinfo's Statsguru figure 1 plots the average runs scored by either the first or second batsman for Pakistan and six other test playing nations that have test records at least as long as Pakistan's 2. The trend decline in the performance of Pakistani openers is immediately clear. Indeed, Pakistani openers in 2016 score around 7 fewer runs than they did in 2000. While this may seem small the analogous measure for the Australian, Indian, English and South African squad displays the opposite trend. For example, Indian openers have steadily improved their average of around 33 in 2000 to 43 in 2016. These trends have resulted in Pakistani openers being ranked near the bottom, a stark contrast to their ranking in the late 90s and early 2000s. Currently, Pakistani test openers comfortably out-perform only the New Zealand top order who have been historically low scorers and also feature a declining trend in average runs scored. More recently, the retirement of legendary openers such as Graeme Smith (South Africa), Matthew Hayden (Australia) and Andrew Strauss (England) has made Pakistani openers more competitive.

While it is useful to consider the aggregate average it is also instructive to study these same figures by opponent. Figure 2 shows the average runs scored by either opening batsmen from 2000 to 2016 against each one of the eight other test playing nations. Since 2000, the Pakistani top order has performed best against Bangladesh and worst against England. The figure also shows the analogous runs scored by the opponent's top order against Pakistan. With the exception of Bangladesh, Zimbabwe and the West Indies our opponent's openers easily out-perform our own top order, with the largest relative performance gap occurring when playing against India and England. So, while Pakistan has long prided itself on it's crack bowling attack it appears as though our opponent's bowlers are much better at managing our openers than we can manage theirs.

Figure 2: Avg. Runs Scored by First or Second Opener by Opponent


Finally, I consider the performance of individual players; the left panel of figure 3 shows the average runs scored by those openers who have opened at least 12 times (innings) for Pakistan since 2000. The restricted sample includes 11 players with the highest and lowest average score belonging to Saeed Anwar and Saleem Elahi respectively. The current openers Shan Masood and Mohammad Hafeez are featured on opposite end of the runs distribution with Masood scoring around 10 fewer runs than Hafeez on average. 

Given the tendency of openers to score both very high and very low scores I also report a measure of consistency 3 for each player in the right panel of figure . Somewhat surprisingly, Shahid Afridi has been the most consistent test opener for Pakistan with a score of at least 30 (around his average) in 8 of his 16 turns as a test opener. The current opening pair feature a similar and moderate level of consistency to each other, although Hafeez has many more test openings under his belt . The least consistent opener since 2000 has been Khuram Manzoor who failed to reach double digits half of the time and scoring at least a half century 20 percent of the time he opened.

Figure 3: Individual Openers in Test Cricket since 2000

Taken together, these data give me cause to be hopeful. While there is no doubt that our openers are currently struggling to perform they also remind me that this was not always the case. As Hafeez's average inches closer to Anwar's one can only hope Masood's recent replacement can follow suit. Of course openers do not play the game alone, fielders and bowlers have an equal responsibility to decrease the runs of our opponent's openers, but that's a whole 'nother story.


1. While the focus here is on test cricket performing an analogous analysis on One Day International (ODI) batting records yields qualitatively similar results.
2. Data was extracted from on July 29th 2016. Given the inherent volatility of the underlying data, figure 1 is smoothed using 10 year moving-averages. Note that these figures report only those runs scored when batting at either number 1 or 2 in the order.
3. This measure is the ratio of the median and mean runs scored by each player.