Home > Net >  Subsetting Data by Year and another column
Subsetting Data by Year and another column

Time:10-15

I have gotten stuck in trying to subset my data. I am dealing with unvoting.csv and am currently trying to group in by years 50-59, 60-69, etc., and at the same time sort it into PctAgreeUS (to see how it changes over time). Are there any suggestions or baseline ways to code the data to create a new vector that holds this information (sorry if the terms I used are wrong still getting familiar with it)

CodePudding user response:

I think you can use the dplyr package to do so. Import the library and than use the dplyr::filter function to filter and dplyr::arrange() for order the data reg specific columns.

CodePudding user response:

We do not have your csv file, so it is impossible to know what the specific columns in your data frame are. Luckily, there is an R package called "unvotes" that contains the same information. Using that, you could carry out your analysis like this:

library(unvotes)
library(tidyverse)

result <- un_votes %>%
  left_join(un_roll_calls, by = "rcid") %>%
  group_by(rcid) %>%
  filter("United States" %in% country, .preserve = TRUE) %>%
  mutate(US_agree = vote == vote[country == "United States"],
         total = n() - 1) %>%
  filter(country != "United States") %>%
  summarize(date = first(date), 
            US_agree = sum(US_agree),
            total = first(total)) %>%
  mutate(decade = lubridate::year(date) %/% 10 * 10) %>%
  group_by(decade) %>%
  summarize(US_agree = sum(US_agree),
            total = sum(total),
            prop = prop.test(US_agree, total)$estimate,
            lower = prop.test(US_agree, total)$conf.int[1],
            upper = prop.test(US_agree, total)$conf.int[2]) %>%
  mutate(decade = factor(paste(decade, decade   9, sep = "-")))

This gives the gollowing result (note we have proportions instead of percentages)

result
#> # A tibble: 8 x 6
#>   decade    US_agree  total  prop lower upper
#>   <fct>        <int>  <dbl> <dbl> <dbl> <dbl>
#> 1 1940-1949     7515  13087 0.574 0.566 0.583
#> 2 1950-1959    15610  27265 0.573 0.567 0.578
#> 3 1960-1969    23831  52205 0.456 0.452 0.461
#> 4 1970-1979    47515 123367 0.385 0.382 0.388
#> 5 1980-1989    33753 202498 0.167 0.165 0.168
#> 6 1990-1999    33482 120811 0.277 0.275 0.280
#> 7 2000-2009    35768 156401 0.229 0.227 0.231
#> 8 2010-2019    51337 162199 0.317 0.314 0.319

This allows, for example, a plot of percentages by decade with 95% confidence intervals:

ggplot(result, aes(decade, prop))  
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, size = 1,
                color = "deepskyblue4")  
  geom_point(size = 2, col = "red3")  
  theme_minimal(base_size = 16)  
  scale_y_continuous(limits = 0:1, labels = scales::percent,
                     name = "Percentage agreement with USA")  
  labs(x = NULL, title = "Percentage agreement with US votes at UN",
       subtitle = "(Taken as percent of total votes per decade)")

enter image description here

Created on 2022-10-14 with reprex v2.0.2

  •  Tags:  
  • r
  • Related