calculate the frequency of two actors voting the same thing-CodePudding

I'm trying to calculate how often, on average, Germany agreed with the US on a vote in the UN general assembly since 1990. For this, I'm using the unvotes package (hosted both on CRAN and GitHub) provides data on the voting history of countries in the [United Nations General Assembly]. I'm focusing on the un_votes and un_roll_calls datasets, which I merged.

So far I have this:

##count how often Germany and the US agreed on a resolution in each year:
countries <- c("United States", "Germany")

by_country_year <- merged %>%
  group_by(year = year(date), country, unres, rcid, vote) %>%
    filter(country %in% countries, year >= 1990)

but I am completely lost as to how I can go ahead. Any leads?

CodePudding user response：

You could reshape your data.frame into a wide format using pivot_wider() and then count the number of times both columns (US and Germany) have the same value. Such as:

dt_wide = by_country_year %>% pivot_wider(id_cols = "rcid", names_from = country, values_from = "vote")
sum(dt_wide$Germany == dt_wide$`United States`, na.rm = T)

Is this what you're looking for?

CodePudding user response：

You can filter according to dates and country, then group by rcid, before summarizing to create two separate columns for the US and German votes.

library(tidyverse)
library(unvotes)

merged <- un_votes %>% inner_join(un_roll_calls, by = "rcid") 

result <- merged %>% 
  filter(date >= as.Date('1990-01-01')) %>%
  filter(country %in% c('United States', 'Germany')) %>%
  group_by(rcid) %>%
  summarize(US_vote = vote[country == 'United States'],
            Germany_vote = vote[country == 'Germany'])

This now allows a table of all votes and how they compare between the two countries.

table(US = result$US_vote, Germany = result$Germany_vote)
#>          Germany
#> US        yes abstain  no
#>   yes     629      15   4
#>   abstain 262      89   0
#>   no      719     399 423

We can also see whether the proportion of agreement is what we might expect by chance. Let's first drop the abstentions and then use a prop.test

result <- result %>% filter(US_vote != 'abstain' & Germany_vote != 'abstain')

prop.test(sum(result$US_vote == result$Germany_vote), nrow(result))
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  sum(result$US_vote == result$Germany_vote) out of nrow(result), null probability 0.5
#> X-squared = 60.611, df = 1, p-value = 6.955e-15
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#>  0.5693588 0.6155882
#> sample estimates:
#>         p 
#> 0.5926761

This means that on votes where neither abstained, Germany and the US were more likely to vote the same way than would be expected by chance. I find this reassuring.

^{Created on 2022-10-04 with reprex v2.0.2}

CodePudding user response：

Using base R

merged <- merge(un_votes, un_roll_calls, by = "rcid")
out <- xtabs(count ~ rcid   country   vote, 
  transform(subset(merged, 
     date >= as.Date("1990-01-01") & 
   country %in% c('United States', 'Germany')), count = 1))