Research

You are currently browsing the archive for the Research category.

Forces that govern people’s behavior in politics are diverse, a diversity not always appreciated by scientists stuck in disciplinary bunkers. While occasional interdisciplinary philandering, by scientists otherwise faithful to their disciplines, has contributed enormously to our understanding of the topic by fruitfully leveraging knowledge across disciplines, formalizing such interdisciplinary training may prove to be a good catalyst for increasing this salutary (though non-virtuous) behavior.

Training for academia should include – logic, ethics (broadly philosophy), methodology, writing, teaching, skill in using tools for statistics (R), and typesetting (Latex), and other miscellaneous but important skills like project management, how to present, etc. In addition, a student needs training in the specific specialization.

Given the extent of training needed, a student needs both time, and strong mentoring.

Here below, I expand upon three particular aspects of training –

Methods Training
Methods training in Political Science, Communication, and Psychology (the three parent disciplines of Political Communication) is generally unsatisfactory, hobbled by incompetent teaching, if not incompetence. The only compensatory aspect is that the courses are fairly applied in nature. For firmer foundations in methodology, required for scholarship, training in Statistics Department is a sine-qua-non.
Courses: statistical inference, modeling and causal inference, stochastic methods, parametric and non-parametric analysis, bayesian analysis; Applied: time series analysis, data mining, sampling, programming, and optimization.

Statistics covers one part of methodology. A separate important part of methods include courses on measurement – what to measure and how best to measure it. Recommended courses: survey design, and psychological measurement.

A course devoted to content analysis may prove useful as well.

Content
There exist at least three fields that directly relate to Political Communication – Psychology, Communication, and Political Science.
Psychology: group psychology, social psychology, cognition, neuropsychology, evolutionary psychology.
Communication: News and Politics, Political Communication (an assimilative course), Political Economy of Media, Media and Communication.
Political Science: historical, institutional, theoretical, and behavioral aspects of politics.

Courses in law and sociology would be useful as well.

Mentoring
Regardless of the efforts to the contrary, there is still considerable variation in the students admitted. Students vary in their level of mental maturity, specific skills that they may excel in, etc. A proper and early assessment of weaknesses and strength of a student can allow the faculty to develop a specific plan crafted to address each. Directed reading courses in initial year(s) with one’s advisor provides an excellent opportunity for the student to learn, and for an advisor to address concerns above and beyond those discovered in reading.

Seminar series provide excellent places to learn from others – care in thinking, presentation skills, research questions, etc.

For all those who cast aspersions on Social Science’s ability to produce valid replicable findings, they need look no further than pollsters (election polling) in US, who have near perfected the ability to produce accurate results, except of course Rasmussen, which has perfected the art of producing reliably Republican findings.

But how are pollsters able to do that with samples of 1000, samples which if collected naively, vary in the estimates of truth enough to render the estimates pointless? The short answer is that they utilize knowledge about how people behave, and how they are distributed in the population, and adjust the results based on that knowledge. Given their ability to do so with great accuracy, it is likely that pollsters know more about how people vote, why they vote, etc. than political scientists. It is also likely however that social science will remain somewhat unaware of the results as most of the knowledge will be proprietary.

Using these techniques in ‘hypothesis testing’
The traditional (frequentist) ‘model’ of hypothesis testing has shied away from utilizing knowledge about the population. Typically, multiple parameters are estimated simultaneously, using ‘regression’ or any of its sibling methods. Bayesian of course embrace the concept of prior knowledge, though typically shy away from utilizing it fully. One modest and somewhat defensible way to test theories would be to ‘fix’ relation of variables with others, using prior knowledge. So in the domain of voting, one can get away from vagaries of sampling, and directly ‘fix’ black respondents from voting 90% for Democrats with modest decreasing propensity given income. This shrinkage of variance from modeling part of data, or theory, would allow for other parameters from being estimated more reliably.

I made many mistakes in my graduate career. Distilled from reflection about those mistakes, and thinking more generally, are a few tips (incomplete and imperfect) for early stage and prospective graduate students-

  • Tools of the trade are five – logic, ethics (broadly philosophy), statistics, writing, and skill in using tools for statistics (R), and writing (Latex). Spend time learning the tools of the trade.
  • Everything takes time. Spend time wisely.
  • Sub-clause: Don’t waste time on projects with no realistic future.
  • Model for dissertation: Markus Prior’s Post-broadcast era democracy

Given time is a scarce valuable resource, deploy other resources to help you get more from your time. Some such resources include -

  • A good laptop, backup drive, and laser printer
  • Quiet, comfortable office space
  • You can’t read what you don’t have. Buy the books you want to read.

“College sophomores may not be people,” (Carl Hovland quoting Tolman) yet research done on them continues to be a significant part of all research in Social Science. The status quo is a function, not of want, but of cost, and effort.

In 2007, Cindy Kam, proposed using university staff as one way to move beyond the ‘narrow database’.

There exist two other convenient ways of expanding the database further – alumni, and local community colleges. Using social networks, universities can tap into interested student, and staff acquaintances.

A common (university wide) platform to recruit, and manage panels of such respondents would not only yield greater quality control, convenience, but also cost savings.

Deliberative Poll ™ works as follows: A random sample of people are surveyed. Out of the initial sample, a random subset is invited to deliberate, given balanced briefing materials, randomly assigned to small groups moderated by trained moderators, allowed the opportunity to quiz experts, and in the end surveyed again.

Reports and papers on Deliberative Polls often carry comparisons between participants to non-participants on a host of attitudinal, and demographic variables (e.g. see here, and here). The analysis purports to answer whether people who came to Deliberative Poll were different from those who didn’t and to compare participant sample to the population. This sounds about right, except this – the comparison is made between participants, and a pool of two likely distinct sub-populations – people who were never invited (probably a representative, random set), and people who were invited but never came. Under plausible and probable assumptions, such pooling biases against finding a result.

The key thing we want to measure is self-selection bias – was there a difference between people who accepted the invitation, and who did not. The correct way to estimate the bias would be as follows:
(Participant/Didn’t come) ~ socio-demographics (gender, education, income, party id, age) + knowledge + attitude extremity

Effect sizes can be provided to summarize the extent of bias. This kind of analysis can account for the fact that bias may not occur at first marginals (gender), but at second marginals (low educated females). (This all can be theory driven, or more descriptive in purpose). The analysis also allows for smaller effects to be seen, as variance within cells are reduced.

p values
When the conservative thing to do is to reject the null hypothesis, think a little less about p-values.

Assuming initial survey approximates a ‘representative’ sample of the entire population. Assuming we want inference how ‘representative’ Participants are to the entire population, it makes sense to just report mean differences without p values.

The survey sample estimates stand in for the entire population. Entire population census numbers are without standard errors or very low s.e. so comparisons are always significant.

By comparing to an uncertain estimate of the population one cannot say whether the participant sample was representative of the entire population. That estimation is without bias but suffers the following problem – the more uncertain the population estimate, the less likely one can reject null, and more likely one is to conclude that the participant sample is representative. One way to deal with this is to do the following – Have 95% conf. band of sample estimate of population and then calculate max and min difference of the sample and report that.

Name calling
Calling the analysis- ‘representativeness’ analysis – seems misleading on two counts –

  1. While a clear representation question can be answered by some analysis, none such question is answered, and can be answered by the analysis presented. Moreover it isn’t clear if it relates to some larger politically meaningful variable. For example – one question that can be posed is whether participant sample resembles the population at large. For answering such a question, one would want to compare population estimates to census estimates (which have near zero variance, so t-tests etc. would be pointless.)
  2. In a series of papers in the 1970s, Kruskal and Moesteller (citations at the end) rightly excoriate the use of `representativeness’, which is fuzzy and open to much abuse.

Citations
Kruskal, W; Mosteller, F. (1979) Representative sampling I: non-scientific literature. Intern Stat Rev. 47:13-24.
Kruskal, W; Mosteller, F. (1979) Representative sampling II: scientific literature. Intern Stat Rev. 47:111-127.
Kruskal, W; Mosteller, F. (1979) Representative sampling III: the current statistical literature. Intern Stat Rev. 47:245-265.

Media scholars have for long complained about the lack of good measures of media use. Survey self-reports have been shown to be notoriously unreliable, especially for news, where there is significant over-reporting, and without good measures, research lags. The same is true for most research in marketing.

Until recently, the state of the art aggregate media use measures were Nielsen ratings, which put a `meter’ in a few households, or asked people to keep a diary of what they saw. In short, the aggregate measures were pretty bad as well. Digital media, which allows for effortless tracking, and the rise of Internet polling however for the first time provides an opportunity to create `panels’ of respondents for whom we have near perfect measures of media use. The proposal is quite simple: create a hybrid of Nielsen on steroids and YouGov/Polimetrix or Knowledge Network kind of recruiting of individuals.

Logistics: Give people free cable and Internet (~ 80/month) in return for 2 hours of their time per month and monitoring of media consumption. Pay people who already have cable (~100/month) for installing a device and software. Recording channel information is enough for TV, but Internet equivalent of channel – main Domain – clearly isn’t as people can self-select within websites. So we only need to monitor channel for TV but more for Internet.

While number of devices on which people browse Internet, and watch TV has multiplied, there generally remains only one `pipe’ per house. We can install a monitoring device at the central hub for cable, and automatically install software for anyone who connects to the Internet router, or do passive monitoring on the router. Monitoring can also be done through applications on mobile devices.

Monetizability: Consumer companies (say Kellog’s, Ford), Communication researchers, Political hacks (e.g. how many watched campaign ads) will all pay for it. The crucial innovation (modest) is addition of the possibility to survey people on a broad range of topics, in addition to getting great media use measures.

Addressing Privacy concerns:

  1. Limit recording information to certain channels/websites – ones which on which customers advertize etc. This changing list say can be made subject to approval by the individual.
  2. Provide for a web-interface where people can look/suppress the data before it is sent out. Of course reconfirm that all data is anonymous to deter such censoring.

Ensuring privacy may lead to some data censoring and we can try to prorate the data we get it a couple of ways -

  • Survey people on media use
  • Use Television Rating Points (TRP) by sociodemographics to weight data.

I have on occasion used American National Election Studies (ANES) cumulative file to do over time comparisons. Roughly half of those times, I have found patterns that don’t make much sense. Only a small fraction of the times when the patterns didn’t make sense have I chosen to investigate the data more closely, as a likely explanation for aberrant patterns. The following ‘finding’ is a result of such effort.

ANES cumulative file (1948-2004) carries a variety of indices. In creating some of the indices, it appears pre-election measures have been combined with post-election measures in some of the years. If that wasn’t enough, at least one of the times, the same index in some years has pre-election measure combined with post-election measure, while using only post measures in other years. Here’s an example –

‘External Efficacy Index’ (VCF0648) is built out of two items –
http://www.electionstudies.org/studypages/cdf/anes_cdf_var.txt

Item 1: Public officials don’t care much what people like me think.
Item 2: People like me don’t have any say about what the government does

Item 2 is asked both pre and post election in some cycles. In 1996, efficacy is built out of -

960568 (pre), 961244 (post)

http://www.electionstudies.org/studypages/1996prepost/nes1996var.txt

[you can ID post-election wave questions through the following coding category - Inap, no Post IW]. Post version of 960568 is 961245

While in 2000 it is built out of – 001527 (post) ,001528 (post)

http://www.electionstudies.org/studypages/2000prepost/anes2000prepost_var.txt

I have alerted the ANES staff and it is likely that the new iteration of cumulative file will fix this particular issue.

Newer entries »