Monday, April 2, 2018

Restaurant Openings/Closings in St. Louis


I recently learnt that the Riverfront Times posts a monthly summary of restaurant openings and closings in the greater St. Louis area. I'd been meaning to get more familiar with GIS mapping and thought it would be interesting to map these summaries.

I restrict attention to the 156 openings and 71 closures of restaurants that took place in St. Louis between January 2017 and March 2018. After mapping these, two streets; Gravois and Cherokee, stand out as having a clear net increase in restaurants with little to no closures. Near where I live, Delmar Boulevard, there has been a net loss of restaurants with 9 closures and 7 openings since January 2017. In general, it seems that many closed restaurants tend to be quickly replaced by new ones and in some cases a closure is temporary (e.g. Rocket Fizz) or simply a move to another area (Snarf's).

The interactive map below lists all the openings (green) and closures (red). Let me know if you notice anything interesting.


Sunday, September 25, 2016

Viewer Complaints to PEMRA


Since 2012 the Pakistan Electronic Media Regulatory Authority (PEMRA), has made available the complaints it has received from citizens. As I'd already been looking for an excuse to practice scraping interesting data from unconventional sources I decided to take a look at the concerns of Pakistanis in this post.

A little background first; PEMRA is similar to the FCC in the US and much like the FCC it is charged with improving standards, access to and competitiveness in the communications industries. While it's domain extends to all electronic media only comments pertaining to TV channels are available online. Submitting a complaint requires filling a simple online form and providing a phone number. Since December 2011 a total of 122,056 complaints  have been filed with PEMRA. The ease of submission also means that a single complaint can be submitted multiple times or even be solicited from others.

The word-cloud below illustrates the content of a typical complaint and is constructed using the 5,711 complaints submitted since January of this year. This snapshot shows that most submissions politely ask PEMRA for bans and make references to Ramadan, controversial host Aamir Liaquat, religion and Geo among others.


Figure 1: PEMRA Complaints since January 2016


Looking at individual comments reveals the broad range of people's concerns. Naturally, many comments refer to commercials
"TV Commercial of Pain Relief Gel (Volatren Gel) , the girl with mini skirt was playing badminton ... its totally against our Ethics & Culture & Religion. Please be strict on vulgar commercial, kindly arrange a commission who will see commercial before it gets ON-AIR we are not against any product , but we must respect our culture we are Muslim."
and tv shows.
"They are showing a very OBJECTIONABLE program by the name of "GAME OF THRONES" This program is not even suitable for Adults. Almost like showing Porn. Please ask the operators to block this particular program."
Others are concerned with editorial decisions made by particular channels and shows.
"...1. It was really shocking to see a reporter lying inside the grave of Abdul Sattar Edhi. It is sheer third class way of getting the high ratings of channel. 2. I request to please stop these idiots from doing such shameless pieces of work. Regards ..."
"Nadia Khan on her show was too ridiculous while talking to Muhammad Aamir the Pakistani Cricketer. She should be banned."
and of course there are also those that are just confusing.
"Sir i am family man my 2 son and 1 daughter qmobile z12 adds very nonsense adds please adds to remove of all channel your see adds is not good adds thank you cartoon network hindi remove this good work of good timing decision pemra zindabad pakistan zindabad allahfiz pemra zindabad pakistan zindabad thank you"

Looking at the time-series of complaints since 2012 below, we see that complaints do tend to increase during the month of Ramadan (shaded bars) and that there is no real time trend in the number of complaints. We also see spikes in response to particular broadcasts. For example in April and May of 2014, Geo channels attracted close to 100,000 complaints following the shooting of one of it's reporters.

Figure 2: PEMRA Complaints over time.

Figures 3 and 4 show the total number of complaints and complaints per viewer made against TV channels since 2012. While the more meaningful measure is the number of complaints per viewer I include the total number of complaints as reliable data on ratings is not readily available for all channels. I exclude those channels that received fewer than 20 complaints between 2012 and 2016.

Both figures show a similar picture with Geo and ARY channels attracting the highest number of complaints. In general, news channels draw the most complaints per viewer followed by entertainment and then all other channels.

Figure 3 - PEMRA Complaints by TV Channel
Figure 4 - PEMRA Complaints per Viewer

While not ground-breaking, I found this an interesting exercise in working with non-numeric data and web-scraping. The code used get data from PEMRA along with the 2016 comments are available on my github.

Tuesday, August 9, 2016

The Distribution of Olympic Talent

With the 2016 summer Olympics in full swing I was disappointed with the lack of athletes from my own country, Pakistan, that I could support. With only seven athletes representing the sixth most populous nation in the world I became curious about the disproportionate, either under or over, representation of other countries in Rio. Do Chinese and Indian athletes make up over 35% of all Olympians this year? Similarly, is the Jamaican athletic delegation reflective of it's around 0.4 percent population share? If not then why? This post addresses these questions.

Using data from the CIA World Factbook and the IOC I obtain measures of global population shares and athlete shares in Rio for each country 1. Using these shares I consider the log difference 2 between the athlete and population shares. So, for example, with around 3 million inhabitants, Mongolia accounts for about 0.04 percent of the global population. On the other hand, the 43 Mongolian athletes account for 0.38 percent of all Olympians this year. So Mongolians are over-represented at this year's Olympics with a large and positive log difference between these two shares. In contrast, Nigerians are under-represented with 2.5 and 0.7 percent population and athlete share respectively.



The above map illustrates the disproportionate representation of each country's Olympic delegation. Countries in green are over-represented and countries in purple are under-represented. Darker shades represent more uneven distributions of athletes and population. The map clearly shows that much of Asia and Africa is under-represented, while much of Europe, Oceania and the Americas are over-represented.

Eyeballing this map suggests that poorer nations tend to be under-represented relative to wealthier nations.This should not surprise most readers since, while it may be true that innate talent is independently and identically distributed (i.i.d) across borders, the resources needed to develop this talent into Olympic athletes is certainly not. The obvious, and most easily measured, of these resources is monetary input. After all, it is expensive to build athletic facilities, pay coaches and travel to international competitions. However, this is far from the only input required to create athletes. Nations need to encourage innate talent and create an environment where sports and athleticism is rewarded either socially or financially. I think of this residual resource as the cultural input necessary for the development of Olympic talent. 

Although culture is hard to quantify, it is possible to obtain a theoretical estimate by controlling for a measure of monetary input. Indeed, by controlling for GDP per capita on our measure of over/under representation we can interpret the residual/unexplained component of representation as being attributed to cultural differences in the perception of athletics. Those countries with the highest(lowest) residual component can then be interpreted to have high(low) levels of cultural input.

The figure below plots the log-differences in athletic and population shares against log GDP per capita for each competing nation. I also include a fitted line obtained from regressing the log differences on log GDP per capita. It is clear that the relationship between these two measures is positive. This simply reinforces the fact that spending on athletics is a normal good






In addition, the figure identifies those countries with highest and lowest 5th percentile residuals from this regression. These are those nations that are furthest, either above or below, from the trend line. Consider the example of China; Chinese athletes are under-represented at this year's Olympics. After controlling for GDP per capita they are still under-represented relative to economies with similar levels of GDP per capita. This unexplained deviation from the trend can be thought of as a low cultural input in China 3. Conversely, athletes from the Central African Republic are over-represented even after controlling for GDP per capita; a high cultural input. These countries can be thought of as outliers.

The table details the the 5 countries with the lowest and highest cultural input into developing Olympic athletes. The bottom four countries share the common trait that they are all majority Muslim. While much progress is being made in this front, traditionally female athletics in Muslim countries is not always actively pursued closing the door to a large pool of potential Olympians which could explain their low ranking.  Furthermore India, the fifth lowest country, is culturally very similar to both Pakistan and Bangladesh suggesting that these three former British colonies share a common attitude towards Olympic sports. 


On the other end, there is no such common trait among the top five countries. Instead each appears to be a specialist in a given set of Olympic sports with a long tradition of producing Olympians. Jamaica dominates track events. New Zealand is included in this list since they compete in team sports (i.e. rugby and soccer) which increases their athlete share. A quick googling suggests that Armenia and Estonia are specialists in weightlifting, wrestling and athletics.

I began looking into this data to understand if Pakistan was exceptional in it's under-representation at this year's Olympics. While Pakistan does rank as having the lowest level of cultural resources it has the company of a few other Muslim nations. We should not aim to be the next Jamaica or Armenia but we should aim to move closer to the trend line and not by lowering our GDP per capita.



1. I only include those countries that have an Olympic delegation. I exclude those countries that have fewer than 1 million inhabitants. The Independent and Refugee Olympic Athletes are also excluded in this analysis. 
2. I use this as an approximation for percentage differences between athlete and population shares.
3. While there are alternative measures of the monetary resources available to develop athletic talent, I think GDP per capita suffices for this blog post.

Saturday, August 6, 2016

Have Pakistani Test Openers Always Struggled?

Pakistan's recent loss in Manchester highlighted a number of the team's shortcomings. From dropped catches by the Pakistani side to a high English run rate the differences between the two teams was clear. For me, the most striking difference was in the performance of the opening batsmen. With England's opening pair, Cook and Hales, scoring a combined total of 316 runs compared with Hafeez and Masood's 171 in the first two tests of the Investec series, our openers have struggled to produce runs. Frustrated by this trait in many recent Pakistani test squads, in this article I analyze the long term trend performance of Pakistani openers and find that, since the combination of Aamir Sohail and Saeed Anwar, they have experienced a steady decline in their run scoring ability while most of their opponents have significantly improved theirs 1.

Figure 1: Average Runs Scored by First or Second Batsman in Test Cricket
Using data from Cricinfo's Statsguru figure 1 plots the average runs scored by either the first or second batsman for Pakistan and six other test playing nations that have test records at least as long as Pakistan's 2. The trend decline in the performance of Pakistani openers is immediately clear. Indeed, Pakistani openers in 2016 score around 7 fewer runs than they did in 2000. While this may seem small the analogous measure for the Australian, Indian, English and South African squad displays the opposite trend. For example, Indian openers have steadily improved their average of around 33 in 2000 to 43 in 2016. These trends have resulted in Pakistani openers being ranked near the bottom, a stark contrast to their ranking in the late 90s and early 2000s. Currently, Pakistani test openers comfortably out-perform only the New Zealand top order who have been historically low scorers and also feature a declining trend in average runs scored. More recently, the retirement of legendary openers such as Graeme Smith (South Africa), Matthew Hayden (Australia) and Andrew Strauss (England) has made Pakistani openers more competitive.

While it is useful to consider the aggregate average it is also instructive to study these same figures by opponent. Figure 2 shows the average runs scored by either opening batsmen from 2000 to 2016 against each one of the eight other test playing nations. Since 2000, the Pakistani top order has performed best against Bangladesh and worst against England. The figure also shows the analogous runs scored by the opponent's top order against Pakistan. With the exception of Bangladesh, Zimbabwe and the West Indies our opponent's openers easily out-perform our own top order, with the largest relative performance gap occurring when playing against India and England. So, while Pakistan has long prided itself on it's crack bowling attack it appears as though our opponent's bowlers are much better at managing our openers than we can manage theirs.

Figure 2: Avg. Runs Scored by First or Second Opener by Opponent


Finally, I consider the performance of individual players; the left panel of figure 3 shows the average runs scored by those openers who have opened at least 12 times (innings) for Pakistan since 2000. The restricted sample includes 11 players with the highest and lowest average score belonging to Saeed Anwar and Saleem Elahi respectively. The current openers Shan Masood and Mohammad Hafeez are featured on opposite end of the runs distribution with Masood scoring around 10 fewer runs than Hafeez on average. 

Given the tendency of openers to score both very high and very low scores I also report a measure of consistency 3 for each player in the right panel of figure . Somewhat surprisingly, Shahid Afridi has been the most consistent test opener for Pakistan with a score of at least 30 (around his average) in 8 of his 16 turns as a test opener. The current opening pair feature a similar and moderate level of consistency to each other, although Hafeez has many more test openings under his belt . The least consistent opener since 2000 has been Khuram Manzoor who failed to reach double digits half of the time and scoring at least a half century 20 percent of the time he opened.

Figure 3: Individual Openers in Test Cricket since 2000

Taken together, these data give me cause to be hopeful. While there is no doubt that our openers are currently struggling to perform they also remind me that this was not always the case. As Hafeez's average inches closer to Anwar's one can only hope Masood's recent replacement can follow suit. Of course openers do not play the game alone, fielders and bowlers have an equal responsibility to decrease the runs of our opponent's openers, but that's a whole 'nother story.


1. While the focus here is on test cricket performing an analogous analysis on One Day International (ODI) batting records yields qualitatively similar results.
2. Data was extracted from on July 29th 2016. Given the inherent volatility of the underlying data, figure 1 is smoothed using 10 year moving-averages. Note that these figures report only those runs scored when batting at either number 1 or 2 in the order.
3. This measure is the ratio of the median and mean runs scored by each player.