Playing with Numbers: Coming up with Objective Ratings of a Subjective Reality

1 Nov

Statistics are only meaningful to the extent that people can identify the phenomenon being measured, come up with a sensible measurement scales to measure primary or secondary observable phenomena and then interpret the results and display them in a lucid fashion. Often times that’s too much to ask and our world is now crumbling under the load of heaps of pointless incomprehensible statistics.

Increasingly, we are trying to understand the world around us via numbers. To this end, a host of research centers and organizations now annually release rankings on issues ranging from corruption to democracy to freedom of press. These rankings are then featured on prime real estate across media and used in homilies, laudatory notes and everything in-between; to buttress indefensible claims; and to bring a sense of “objectivity” to a media-saturated with rants of crazed morons.

“Lost in translation” are subtleties of data, methods of data collection and of analysis, and the caveats. What remains, often times, are savaged numbers that peddle whatever theory that you want them to hawk.

Understanding with numbers

The field of social science has been revolutionized in the recent decades with “positivist” approaches using statistics dominating the field. The rise in importance of “numbers” in research is not incidental for numbers provide powerful new ways, particularly statistics, to analyze concepts. Today numbers are used to understand everything from democracy to emotions. But how do we go about measuring things and assigning number to thing which we haven’t yet even been able to define, much less explain?

Let me narrow my focus to creation, interpretation and usage of rankings to substantiate the problems with using statistics.

More Specifically, Rankings

Reporters Sans Frontiers (Reporters without Borders and henceforth called RSF) came out with its annual “Worldwide Press Freedom Rankings”. The latest rankings place USA at 53, along with Botswana and Tonga, India at 105 while Jamaica and Liberia are ranked 26 and 83 respectively. The top-ranked South Asian country in the rankings is Bhutan at 98. Intuitively, the rankings don’t make any sense and a little digging into RSF’s methodology for compiling these rankings explains why.

Media’s fascination with rankings

The rankings received wide attention and made it to the front pages of countless newspapers. There is a reason why rankings are the choice nourishment of media starved of any “real information”. Numbers capture, or so is thought, a piece of “objective” information about the “reality”. Their usage is buoyed by the fact that rankings are seductively simple and easy to interpret. Everyone seems to intuitively know the difference between first and second. All that needs to be done is present the fluff, the requisite shock and horror and the article is written.

On to the problems with rankings or the “rank smell”

How can you measure objectively when you need a subjective criterion to come up with a scale?

This is something I raise earlier when I talk about how we can understand concepts like democracy or say emotions using numbers. Researchers do it by assigning number or related phenomenon – in the case of emotions it may be checking the heart rate or doing a brain scan or counting the number of times you use certain words, while in the case of democracy it can be how frequently the elites change, or how many people vote in the elections. But still numerous problems remain especially when we try to order these relatively hazily defined concepts. Say for example the elite turnover in US Congress has of late been fairly close to 2% and that doesn’t seem fairly democratic to me and how does it compare with somewhere like India, where elite turnover may be higher but where members of one family have held key positions in India politics since inception.

Relativity
To rank something means to determine the relative position of something. Rankings NEVER tell one about absolute position of something unless of course they are an incidental result of a score on a shared scale. For instance – RSF’s ranking of USA at 53 in the worldwide press freedom rankings doesn’t tell one whether USA’s press enjoys freedom over say a particular bare threshold below which a functioning press can’t be legitimately said to exist. A lot of people have misinterpreted India’s slide from 80, in 2002, to 105. They believe that it is a slide in absolute terms but the rankings only tell us of a slide in relative terms. There may be an argument to made that India is doing better than it is doing in 2002 in absolute terms but not in relative terms to say other countries. In other words, the press freedom in India may have improved since 2002 but as compared to other countries, India’s press is less free today.

The scale of things

To rank something, one has to use a common scale. Generally a scale, especially one measuring a complex concept like democracy, would be a composite scale of a variety of variables. One now needs to think of a couple of things. How does one weight the variables in the scale between time periods and between countries? For example how do you account for higher usage rates of media in one country (and possibly associated higher level of censorship) to say a country with low media usage and possibly lower total censorship? One may also argue that the media penetration is lower by deliberate action (as in limitation for foreign content owners to broadcast) or other factors (poverty). One must also tackle the problem of assigning “weight” to each facet.

Methodology

RSF ranking are based on a non-representative survey of pre-chosen experts. Hence it is more of a poorly conducted opinion poll rather than a scientific survey. Statistics gets its power of generalizability from the concept of randomization. RSF methodology is more akin to conducting a poll of television pundits on who will win the elections and I am fairly sure that the results would be more often wrong than right.

Secondly, questionnaire includes questions about topics like Internet censorship. No explicit mechanism has been detailed where we know that these scores are weighted based on say Internet penetration in each country. If no cases of Internet censorship was reported in Ghana, and it consequently gets a higher ranking as compared to a country Y whose press is freer but did report one case of Internet censorship – it implies the system is flawed. Let me give you another example. India has the largest number of newspapers in the world and there is a good chance that the total number of journalists harassed may well exceed that of Eritrea. It doesn’t automatically flow that Eritrean press is freer. One may need to account for not only the number of journalists (for more journalists per capita may mean a freer press) but also crime against journalists per capita. In the same vein we may need to account for countries which in general have a high crime rate and where journalists by pure chance, rather than say a government witch hunt, may have a higher chance of dying.

One also need to account for the fact that statistics on these crimes are hard to come by especially in poor countries with barebones media and there is a good chance that they are under-reported there.

On the positive side
Rankings do give one some estimate about the relative freedom of a country. Proximity to Saudi Arabia in the ranking does give us an idea about the relative media freedom.

A lot of the criticism lobbed against India’s low placement in the RSF rankings has been prompted by people’s perception of India as a functioning democracy with a relatively free press. What goes unmentioned are episodes like Tehelka and the one faced by Rajdeep Sardesai of recently. India’s press, especially in small towns, is constantly under pressure from the local politicians who monitor aggressively.

What can RSF do?
I would like to see a more detailed report on each country especially marking areas where India is lagging behind. Release more data. Aside from protecting sources, there should be no concerns regarding release of more data. Release it to the world so policymakers and citizens can better understand where improvements need be made.

Release a composite score index that is comparable across time rather than countries. There are far too many problems comparing countries. Controlling for major variables like economic growth etc., we can get a fairly good estimate of how things changed in the course of a set of years.

Conclusion

Whenever we do use numbers to understand concepts, we sacrifice something in what we understand or our conceptual understanding. Some numbers like demographics are relatively non-debatable. Even there debates have arisen in defining who are say Caucasians etc? More debatable are how numbers are used in say the realm of content analysis. What does it mean when a person says a particular word in a sentence? Does it mean that somebody who uses the word “evil” twice in describing Bush hates him twice as much as the person who only uses it once? The understanding and “counting” of words has largely been limited to simple linear additions. We haven’t yet tried understanding strength of words as an equation of countless variables or more importantly learned how to work with that much data so we use shortcuts in our understanding.

Numbers can give one a sense of false objectivity. The ways numbers are trimmed and chopped to support a particular point of view leave them meaningless, yet powerful.

The problems that I describe above are twin fold – errors in coming up with rankings and errors in reporting the rankings. In all, we need to be careful about the numbers we see and use. It doesn’t mean that we need to distrust all the statistics that we see and burrow our nose but we can do well by being careful and honest.

Closing Thought

According to UNECA, Ethiopia “counted 75 000 computers in 2001 and 367 000 television sets in 2000. Only 2.8 % of the total number of households in the country had access to television and approximately 18.4 % of people had a radio station in 1999 and 2000.” These numbers do inform. They talk about poverty. For the West, obsessed with issues of liberty and running from its own increasingly authoritarian regimes, press freedom is “the” issue. In the hustle, they miss some of the more important numbers coming from other countries that tell different stories.