By Nick Li
In my last post I argued that Barack Obama’s so-called move to the center is largely a fiction. This perception appears to be partly driven by a misunderstanding of his previous positions, for example on capital punishment. Some of it also appears to be the result of taking statements made in a speech out of context or putting a rather radical spin on them and then calling the subsequeht clarification and nuance a flip-flop(whether on timetables for withdrawal from Iraq, delaying tax increases if the economy is in recession, or meeting without preconditions with leaders of hostile regimes). But a large part of it must be due to the perception that Obama is a very liberal politician at heart. This perception is largely driven by the often repeated claim that Obama is the most liberal (or left-wing for our international readers) senator in the US senate, which according to McCain now means he is left of a self-described socialist, Senator Bernie Sanders of Vermont. The source for the claim is a study by the National Journal, a magazine for Washington insiders. In this post I will discuss the methodology behind the study and some other methodologies used to gauge, based on voting records, where to located a politician on the political spectrum. The basic ingredient for locating a politician on the political spectrum in a more "objective" way is to look at the actual record of votes that they cast in the legislature. This is called a "roll call." In the US congress your options as a legislator are to vote "Yeay," "Nay" or not show up to vote at all. The fact legislators are not required to cast a vote for every bill has important implications. The National Journal methodology is discussed on their web page:
A panel of National Journal editors and reporters initially compiled a list of 216 key congressional roll-call votes for 2007 — 107 votes for the Senate and 109 for the House — and classified them as relating to economic, social, or foreign policy.
Polidata, a nonpartisan political data-analysis firm, downloaded lists from the House and Senate websites of how all the members voted on the key votes. Those lists were then sent to the Brookings Institution, where the Information Technology Services division performed the data processing and statistical analysis. The ratings system was first devised in 1981 under the direction of William Schneider, a political analyst and commentator, and a contributing editor to National Journal.
The votes in each issue area were subjected to a principal-components analysis, a statistical procedure designed to determine the degree to which each vote resembled other votes in the same category (the same members tending to vote together). Ten of the 216 votes (eight in the Senate and two in the House) were dropped from the analysis because they were statistically unrelated to others in the same issue area. These typically were votes that reflected regional and special-interest concerns, rather than general ideology.
The analysis also revealed which yea votes correlated with which nay votes within each issue area (members voting yea on certain issues tended to vote nay on others). The yea and nay positions on each roll call were then identified as conservative or liberal.
Each roll-call vote was assigned a weight from 1 (lowest) to 3 (highest), based on the degree to which it correlated with other votes in the same issue area. A higher weight means that a vote was more strongly correlated with other votes and was therefore a better test of economic, social, or foreign-policy ideology. The votes in each issue area were combined in an index (liberal or conservative votes as a percentage of total votes cast, with each vote weighted 1, 2, or 3).
Absences and abstentions were not counted; instead, the percentage base was adjusted to compensate for missed roll calls. A member who missed more than half of the votes in any issue category was scored as "missing" in that category (shown as an asterisk [*] in the vote-rating tables).
Members were then ranked from the most liberal to the most conservative in each issue area. These rankings were used to assign liberal and conservative percentile ratings to all members of Congress.
The liberal percentile score means that the member voted more liberal than that percentage of his or her colleagues in that issue area in 2007. The conservative figure means that the member voted more conservative than that percentage of his or her colleagues.
For example, a House member in the 30th percentile of liberals and the 60th percentile of conservatives on economic issues voted more liberal than 30 percent of the House and more conservative than 60 percent of the House on those issues, and was tied with the remaining 10 percent. The scores do not mean that the member voted liberal 30 percent of the time and voted conservative 60 percent of the time.
Percentile scores can range from a minimum of 0 to a maximum of 100. Some members, however, voted either consistently liberal or consistently conservative on every roll call. As a result, there are ties at both the liberal and the conservative ends of each scale. For that reason, the maximum percentiles are usually less than 100.
Members also receive a composite liberal score and a composite conservative score, each of which is an average of their six issue-based scores. Members who missed more than half of the votes in any of the three issue categories do not receive a composite score.
In assessing what they do, it is vital that we focus on the particular judgment calls that are implicit in this methodology:
(1)Choice of which roll call votes to use, "i.e.key votes." This is by far the most subjective input to the National Journal Ranking system because it is not based on any predetermined methodology but is the outcome of the editorial board and reporters of the National Journal ("several" of them) going through all 442 votes cast in the 110th Senate and picking 107 that they consider "key" votes.
(2)Choice of how to divide the roll call votes they do select into three separate issue groups – economic, social, and foreign policy. This is important because the outcome of the principal components analysis of each issue group is independent of what happens in the other groups – the choice of how and whether to break up the issue groups will affect the outcome. It is not even clear what is gained by dividing the roll calls into these three issue groups, and why three groups were selected as opposed to a more broad or refined classification. After all, it is only the average score across all three issue groups that is reported in the press. I have not seen the issue scores reported on on the National Journal page (http://nj.nationaljournal.com/voteratings/#) or anywhere else – if the point is to achieve an ideological ranking on a single dimension it is not clear that independently analyzing issue areas and averaging is better than analyzing all of the data together. Perhaps more alarming is that 8 roll call votes are dropped at this stage of the analysis because they are "statistically unrelated to others in the same area." This represents about 7% of the sample, and it is not clear why their methodology should be dropping roll call votes that reflect "regional and special-interest concerns, rather than general ideology" when the 107 vote sample is already selected to represent "key votes." Though they may be adopting a consistent procedure over time by dropping votes that don’t fit a general pattern (in the technical language of principal components analysis this means dropping the votes with the lowest eigenvalues in the covariance matrix), I have to question any methodology which subjectively selects "key votes" based on relevance to determining an ideological ranking, only to have some of these "key votes" dropped because in fact they turned out to be non-ideological after all.
(3)Use of principal components analysis Principal components analysis is a statistical technique used to reduce the dimensionality of high dimensional data. In the case of the Roll Call data used by the National Journal, there might be about 33 different roll call votes in a particular issues group for 100 senators. Principal components analysis involves reducing these 33 different votes into a single index, by finding the eigenvector corresponding to the highest value eigenvalue of the demeaned covariance matrix and premultiplying the transpose of the original data (a 100×33 matrix) by the transpose of that eigenvector. This method is fairly common in political science and more technical fields to reduce high dimensional data to lower dimensions, and is thus well suited to creating an index. However, it is not the only way to create an index. Below I discuss an alternate methodology that has more intuitive appeal.
(4)Use of importance weights. This is one of the least transparent elements of the methodology but potentially very important in determining the final rankings within each issue area. Rather than counting each roll call vote equally within an issue area (which is what a standard principal components analysis would do), they are assigned weights "from 1 (lowest) to 3 (highest), based on the degree to which it correlated with other votes in the same issue area… The votes in each issue area were combined in an index (liberal or conservative votes as a percentage of total votes cast, with each vote weighted 1, 2, or 3)." It is not clear at all from this description what criterion is used to determine the "degree" to which a vote is correlated with other votes, as a correlation is just a number on the 0 to 1 interval. Clearly they have specified some arbitrary cut-offs for the correlation, though whether this is cardinal (based on some actual numbers that cut the correlation interval twice between 0 and 1) or ordinal (based on ranking the correlations for each vote and picking the X most correlated votes as weight 3, etc.) is unclear from the description above. Furthermore, the use of weights 1,2, or 3 are also arbitrary – the results would be different if we picked weights 2,3,4 or 1,5,10, etc. The fact that the single index for an issue area is based on simply adding up the points from these "importance weights" means that the arbitrary choices about these importance weights are crucial inputs.
(5)Converting to percentiles. With the single index for each issue area determined, this index is used to rank each senator compared to their peers. A senator assigned a liberal ranking of 50% has a liberal index that is higher than 50% of their fellow senators. This is also somewhat arbitrary as the next step in the process is to average the percentile scores. Directly averaging the index scores might yield different results.
(6)Averaging of three issue groups. The final score, used to determine that Barack Obama was the most liberal senator in 2007 and Hillary Clinton was the 16th most liberal senator (and used to determine that McCain did not cast enough votes to qualify for a ranking) , is based on averaging the percentle rankings across the three issue areas, using equal weights (even though the issue areas have different numbers of votes). The choice of an unweighted average is arbitrary, as a vote-weighted average or a median or geometric average with a certain curvature parameter could have been constructed just as easily. Now I can’t prove that they rigged the results to get Obama as most liberal. Indeed, the creators of the index are very sensitive to this charge. On their web-site they discuss it:
Q: How do you pick the votes?
Green: Toward the end of every year, several National Journal reporters and editors separately sift through all of the year’s roll-call votes to identify ones that might be appropriate for the vote ratings. The reporters and editors then meet and make the final selections.
Q: How do you determine which votes are "appropriate"?
Green: First we try to identify the most important House and Senate votes of the year. Then we look for votes that show ideological distinctions between members, even if the votes aren’t necessarily pivotal. Finally, we try to make sure that a wide range of issue areas are represented, such as abortion, the budget, energy, environment, immigration, Iraq, national security, and taxation.
Q: Can you give an example of votes that show ideological distinctions?
Green: The Senate voted last year on whether to repeal the federal minimum wage. The outcome of the vote was never in doubt — only 28 senators voted for the repeal; 69 voted against it. But the vote seemed to us to be worth including in the ratings because it showed the ideological differences between senators who thought that setting a minimum wage is an appropriate function of the federal government (what we termed the liberal position) and those who thought that such matters should be left to the states (what we termed the conservative position).
Q: Why don’t you base the ratings on all of the roll-call votes, rather than just some of them?
Green: Last year there were 1,186 roll-call votes in the House and 442 in the Senate. Many of them are on relatively minor matters and are noncontroversial. Other votes fall along regional or other nonideological lines. We think that a rating based on key votes is more informative.
Q: When you selected the Senate votes for 2007, did you know that Sen. Obama was going to have the most liberal rating?
Green: No. In fact, we didn’t even know whether he would qualify for a score. Under our system, a member of Congress gets a liberal and conservative score in each of three broad issue areas — economic policy, social policy, and foreign policy. A member must participate in at least half of the votes in a category to get a score in that category. If a member gets a score in all three categories, he or she also gets a composite score, essentially an average of the three scores. If a member doesn’t get a score in all three categories, he or she doesn’t get a composite score. Obama and other presidential candidates were absent a fair amount in 2007, so we weren’t sure if they would get composite scores. Obama’s composite score is the basis for his label as the most liberal senator in 2007.
Q: When you selected the votes, were you keeping track of how Obama (or any other member of Congress) had voted?
Green: No.
Q: Aren’t the labels "liberal" and "conservative" open to interpretation?
Green: Yes. On some matters, most people would agree on what constitutes a liberal position or a conservative position. On other matters, it’s not as clear-cut. Some critics of the war in Iraq, for instance, argue that opposition to the war is a conservative position because it reflects a belief in limited government involvement in international affairs. But in National Journal’s ratings, votes in opposition to the war are categorized as liberal. Labels such as "liberal" and "conservative" are just that — labels. They are subject to debate. But as long as National Journal thinks there’s a broad consensus about what these labels mean, we’ll continue using them in our vote ratings.
Q: You keep referring to Obama and Clinton. What about John McCain?
Green: He didn’t get a composite score for 2007 because he missed too many votes.
Q: Are you concerned that National Journal’s 2007 rating of Obama as the most liberal senator will become an issue in the presidential campaign?
Green: We can’t control how the vote ratings are used in the campaign. One reason for this Q&A is to try to anticipate possible questions and be as open as possible about how the ratings were determined.
Q: Didn’t you go through the same situation four years ago?
Green: Yes. In 2004, National Journal rated Democratic presidential nominee John Kerry as the most liberal senator in 2003. The rating quickly became a talking point in the campaign, with President Bush, Vice President Cheney, and other Republicans using it to attack Kerry. For his part, Kerry called the rating a "laughable characterization." He said it was "absolutely the most ridiculous thing I’ve ever seen in my life."
Q: Have you made any changes in the vote rating system since then?
Green: We made one change. We decided that in order for a member of Congress to receive a composite rating, he or she needed to vote often enough to qualify for scores in each of the three issue categories-economic policy, social policy, and foreign policy-that we measure. In Kerry’s case, he didn’t vote often enough in 2003 to merit scores in the social-policy and foreign-affairs categories. His overall ranking was based on his score in the economic category.
Q: Why did you make the change?
Green: We didn’t want to continue giving composite scores to members of Congress who missed most of the votes we selected.
Q: Why didn’t you make the change before Kerry’s rating was announced?
Green: The method we used to give Kerry a composite score was the method we had used in the past. To change the rules in the middle of the game, so to speak, after we learned Kerry’s ranking, would have exposed us to charges of manipulating our rules for partisan reasons. We instituted the change the following year, before we knew the scores of any lawmakers.
Q: Do you think that the National Journal vote ratings are a valid way to judge a member of Congress?
Green: It’s one way to assess a member of Congress, but by no means the only way. It’s important to look at a member’s effectiveness, character, judgment, and policy proposals, among other things. It’s also valuable to look at vote ratings from other organizations — from publications such as Congressional Quarterly and interest groups such as the League of Conservation Voters, the American Civil Liberties Union, and the American Conservative Union — to get a rounded view.
My issue with the National Journal system is that it seems unneccessarily complicated and requires too many subjective judgments and assumptions on the part of the creators. It is not clear at all how sensitive the final index is to the various assumptions that enter every stage. Some evidence for this can be seen by looking at scores over time. Since 2001 Hillary Clinton has been 25th,12th,8th,34th,20th,32nd,16th most liberal. Barack Obama has been 16th, 10th, and now 1st in his three years in the senate. Further evidence of sensitivity can be observed by examining that actual votes cast by Clinton and Obama on the 99 votes used to calculate the index. Out of a total of 99 votes, Obama and Clinton cast different votes twice:
18/S1 Establish a Senate Office of Public Integrity to handle ethics complaints against senators.: Obama voted yeay and Clinton voted Nay. The Bill failed, but more Democrats than Republicans voted for the bill, which explains why Obama’s vote counts as more liberal than Clinton’s vote. However, it is worth pointing out that four Republicans supported the bill including Lindsey Graham and John McCain, and that numerous democrats including Ted Kennedy and Bernie Sanders (a "self-declared socialist" who is technically an independent but caucuses with the democrats) voted against the bill. Note that the construction of the index thus requires that senators who support greater monitoring of congressional ethics are "liberal" and that ethical conduct and regulation thus becomes a liberal position, simply by virtue of who voted for the bill.
189/S1348: Allow certain immigrants to stay in the United States while renewing their visas. Obama voted in favor and Clinton voted against. That’s it. Those two votes make Obama the most liberal senator and Clinton 16th.
[In fairness, there is probably another vote that matters – 349/HR1585 "Express the sense of the Senate that the Iranian revolutionary guard should be designated a terrorist organization." – which is the only vote which was actually discussed in the media, with Obama criticizing Clinton heavily for her Yeay vote. While Obama did not vote on this bill, he opposed it subsequently during the primary campaign.]
Looking at these votes we see that the system is highly senstitive to a couple of votes, and one can certainly argue that the two votes on which they differed are not important enough or ideological enough for a reasonable person to conclude that Obama is 15 senators more liberal than Clinton. A major consequence of using only 99 votes instead of the full information available is that one or two votes can lead to very large differences in the rankings and the choice of which 99 votes to use will be extremely important. Considering that Clinton and Obama differed on 10 votes out of 442 it would be informative to know how they differed on the other 8 votes, and whether we conclude that those are somehow less important than the 2 that were included. Also, note that the use of issue groups is also important here – all of the immigration issues are lumped into "social issues" rather than economic issues, while the ethics bill was deemed an "economic issue." This is not a very obvious classification to me and has important implications – if Obama was already the most liberal senator on economic issues, moving the immigration vote from economic to social issues will greatly increase his liberal rating while having no effect on the actual votes cast. In other words, if we grouped all of the most "liberal" Obama votes into one issue category, this will tend to lower his liberal score because his percentile rank cannot exceed 100 in that one category and his percentile rank in the other categories will drop.
This is just one of many examples of how this system is (a)easy to manipulate due to reliance on many subjective judgments, (b)relies on numerous assumptions, (c)is highly sensitive to these assumptions, (d)is unncessarily complicated, and (e)does not use the full information available. I would expect nothing less of a system devised by one of the most idiotic CNN talking heads, Bill Schneider. Another knock on the system is that it does not give us results that correspond to our prior beliefs. It is widely believed by many people that Russ Feingold and Bernie Sanders are the most liberal senators – while we do not require any and all studies to confirm such prior beliefs, any methodology that does not find this result is suspect.
This is partly related to an underlying bias in the methodology that kicks in for election years. It is not a coincidence that Kerry was found to be the most liberal senator in 2003 when he was running for President and that Obama finds himself in a similar position 4 years later. And it is not necessarily direct manipulation by the editors of the National Journal. When a Senator is running for President, they tend to miss a lot of votes – McCain missed so many votes in 2007 that he is not even ranked, and under the current system (revised after Kerry lost in 2004) Kerry would not have been ranked in 2003. These missed votes are not accounted for by the methodology. The votes which Presidential candidates do show up to cast are typically the very close, ideologically divisive party line votes, which are exactly the ones which get weighted more in this system. By not showing up to vote on the less ideologically divisive votes, Obama missed many opportunities to cast votes that would have made him appear "less liberal" to the National Journal Rankings. Thus there is a built-in bias towards finding that presidential candidates are more liberal, because their campaigning leads to a systematic form of selection bias in terms of which votes are missed.
There are many things the National Journal could do to make their index more convincing. One thing they could do is to simply get rid of their convoluted system and adopt something along the lines of the Poole et al. system which is discussed below. In other words, leave the analysis to political scientists who actually understand mathematics (instead of outsourcing your statistical analysis to Brookings) and who refrain as much as possible from making subjective judgments and who use the maximum information available. But if they wanted to keep the ridiculous "Bill Schneider" system, they should perform numerous robustness checks. This is a hallmark of all empirical analysis in economics and political science these days. The National Journal should try
(a)different sets of key votes, including allowing them to be generated randomly or using all hte votes,
(b)using different groupings of issue sets (including putting all votes in a single group),
(c)doing the group indexes without importance weights or using different importance weights,
(d)converting the group indexes to a common index without converting to within group percentiles first,
(e)using different weighting schemes for combining groups into a single index.
If the results were the same or very similar for all of these different specifications – which I would bet my annual income would not be the case – we would have much more confidence in the results and their robustness. The National Journal ratings are very influential, perhaps inexplicably so because of the questionable methodology, and so it is incumbent on the editors of that journal to take their job very seriously and be responsible. However, it is also the responsibility of the media to call attention to the biases and limitations of hte National Journal rankings and to discuss alternatives. There are many allternative methodologies that exist. The failure to present and discuss alternative rankings and the failure to challenge assertions based solely on the National Journal rankings is partly the result of the abject failure of the American media to take its job seriously and the lack of intellectual weight behind most political pundits.
To end this blog entry I would like to briefly discuss what is widely considered to be one of the best and most thorough discussions of political ranking systems, which comes to us courtesy of University of California at San Diego Political Science Professor Keith T. Poole and his co-authors. The website is here. Poole has directly tackled the question of who is more liberal – Obama or Clinton – as well as the question of where these two senators lie in the overall US political spectrum here. It turns out that Obama and Clinton are very close on the political spectrum, with Obama marginally to the left of Clinton, and both are a little bit left of the average Democrat. There is lots of room to their left however. Interestingly there is also no overlap at all between the two political parties reflecting the deep polarization of American politics – the most liberal Republican is pretty far right of the most conservative Democrat. Poole has also created is own year by year index for the US Senate ranking senators from most liberal to most conservative. Poole’s results indicate that in the 110th Senate (2007) Obama was the 11th most liberal Senator, while Clinton was 20th and McCain was the 92nd most liberal (8th most conservative) Senator. This puts McCain closer to his party’s extreme wing than Obama, and the pattern is replicated in the 109th Senate where Obama is 21st most liberal, Clinton is 25th most liberal, and McCain is 98th (2nd most conservative). Russ Feingold, Chris Dodd, Ted Kennedy and Barbara Boxer are all consistently more liberal than Obama (Dodd and Feingold led the fight against retroactive immunity in FISA that created so much controversy between Obama and left-wing bloggers). In fairness to McCain, his rating has fluctuated a lot over the course of his career and it is only the last three years 2004-2007 that he has consistently been among the most conservative senators (i.e. the period during which he gave up on being a Maverick and shored up his support with the party’s conservative base to make another run at the presidency and this time win the Republican nomination).
Why do I have greater confidence in the Poole results? I present not merely to show that there are alternative rankings but also to assert that Poole’s methodology is better. Part of the reason is greatly superior transparency – Poole publishes all the details of his methodology on his web-site and publishes them in peer-reviewed journals. He also publishes the data he uses and provides programs to replicate his results. Part of the reason is that Poole uses more data – he uses 388 of the 442 roll call votes cast in the 110th Senate, instead of the 107 (and eventually 99) used by the National Journal. He only drops some votes because of an explicitly stated cutoff- he only included votes that had at least a 0.5% minority, which means that at least one senator out of a hundred voted against the majority. [His program allows one to alter this cut-off, but it is stated explicitly and involves only one subjective judgment call as opposed to the many required to pick a list of 107 votes out of 442.] So part of the reason is that there is no subjectivity – the only two decisions required are (1)what the minority vote percentage cutoff is to be included, and (2)picking one senator who we think is more liberal than the average [it doesn’t matter whether this is Feingold, Clinton, Obama, etc. as this merely determines the starting point for the algorithm, which will converge to the same results as long as we don’t pick the "wrong" person like McCain.] That’s it. What does the algorithm do? Rather than divide individual bills up into groups, assign importance weights, find percentiles, etc. it does something rather simple and intuitive. It counts every bill equally and orders Yeays and Nays. It then sorts through all of the possible rankings of senators to see which one generates the least errors relative to the votes that actually occur. For example, suppose there are three senators, Obama, Clinton, and McCain, which we arbitrarily assign this order. Suppose there are four votes:
YYN
NYY
NNN
NYN
The algorithm will rank them Obama, Clinton, McCain from most liberal to most conservative to generate the minmum number of errors. The ranking will still generate an error as the fourth vote has Clinton voting differently than Obama and McCain, and the point of the algorithm is to minimize these errors when we use all 100 senators and all 388 or X roll call votes. I find this methodology to be much more intuitive and transparent than principal components analysis, and infinitely better than the overly complicated Nationa Journal system. Of course, these rankings measure different things, and the Poole Classification implicitly assumes that all votes are equally important and all information should be used. I find this appealing because it removes subjective judgments, but there are many who may disagree. Indeed, in the extreme one could simply say only one or two votes "really" matter, and use that as a basis for classification – this is how many voters work, and there are many ranking systems (by liberal and conservative advocacy groups like the ACLU, Planned Parenthood, Sierra Club, NRA, Heritage Foundation, etc.) that only look at a few votes that are interesting to a particular constituency. However, when making a "broad" claim about a candidate – such as who is the most liberal or conservative – I find it more sensible to use all the available information and remove subjective judgment rather than relying on a methodology that is highly sensitive to the subjective decisions made about which votes to include and how to classify and weight them.