I have a dataset which states for each UN resolution the country and the vote:
ResolutionID: 1,2,3,... Country: US, CA, MX, ... vote: yes, no, abstain Dataset
I want to create an variable calculating for each country pair (e.g. US-CA, US-MX, MX-CA,...) the correlation of their voting records. Thus, providing an index for each country pairs friendship or strategic alliance.
What R Code do I have to used?
Citeation of the dataset: Erik Voeten "Data and Analyses of Voting in the UN General Assembly" Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013)
What R Code do I have to used?
CodePudding user response:
Using the algorithm for calculating the 'index of agreement' as proposed by Lijphart (1963), the below might be what you're after.
## set up sample data
set.seed(32446)
test = data.frame(rcid = rep(seq(3), each=10), country = rep(letters[seq(10)], 3), vote = sample(c("yes","no"),30, replace = T))
test$vote[sample(seq(30),3)] = "abstain" # add in a few abstentions
test = test[-sample(seq(30), 2), ] # remove some as missing
test
## set up the comparison df
allCountries = unique(test$country)
compdf = outer(allCountries, allCountries, paste)
compdf = data.frame(compdf[which(lower.tri(compdf))])
names(compdf) = "comp"
## cycle through the resolutions - use scoring as per Lijphart (1963): agreement by averaging the scores of 1 if there is agreement, 0 if the vote is opposite, and 0.5 if only one country abstains.
for(r in unique(test$rcid)){
tempVotes = data.frame(countries = allCountries,
test$vote[which(test$rcid==r)][match(allCountries, test$country[which(test$rcid==r)])])
tempVotes = outer(tempVotes[,2], tempVotes[,2],
FUN=function(x,y){
ifelse(is.na(x) | is.na(y), NA, #NA if one country didn't vote
ifelse(x==y,1, ## 1 if they agree
ifelse(x=="abstain" | y=="abstain", 0.5, # 0.5 if one side abstains
0)
) # zero otherwise
)
}
)
compdf = cbind(compdf, tempVotes[which(lower.tri(tempVotes) )] )
names(compdf)[ncol(compdf)] = paste0("resolution_",r)
}
## calculate the mean score across resolutions
result = data.frame(comp = compdf$comp,
result = rowMeans(compdf[,seq(2, ncol(compdf)) ], na.rm=T)
)
result$result = 1-result$result # make it into a distance score, rather than agreement score
## create distance matrix and plot Dendrogram
library(ggplot2)
library(ggdendro)
distance = matrix(NA, length(allCountries), length(allCountries))
distance[which(lower.tri(distance))] = result$result
rownames(distance) = allCountries; colnames(distance) = allCountries
distance
cluster = hclust(as.dist(distance))
ggdendrogram(cluster, rotate = FALSE, size = 2)
Lijphart, A. The Analysis of Bloc Voting in the General Assembly: A Critique and a Proposal. The American Political Science Review Vol. 57, No. 4 (Dec., 1963), pp. 902-917 https://www.jstor.org/stable/1952608