How to create a table in R that displays the percentage of observations per year equal to a certain-CodePudding

I'm working with a time series dataset on levels of opposition in authoritarian regimes. I've included a sample of the data below. I would like to produce a table that displays the percentage of countries per year with a value of 1 for v2psoppaut. Could someone tell me how to go about doing this? I'd like to produce a table that I can save as a new df for plotting.

structure(list(year = 1900:1905, COWcode = c(70L, 70L, 70L, 70L, 
70L, 70L), country_name = c("Mexico", "Mexico", "Mexico", "Mexico", 
"Mexico", "Mexico"), country_text_id = c("MEX", "MEX", "MEX", 
"MEX", "MEX", "MEX"), v2x_regime = c(0L, 0L, 0L, 0L, 0L, 0L), 
    v2psoppaut_ord = c(2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, 
6L), class = "data.frame")

CodePudding user response：

Assuming that when you group by year you only have one observation per country then you could do something like this:

df %>% 
  group_by(year) %>% 
  summarize(prop = sum(v2psoppaut_ord == 1)/n())

Here prop is the proportion of v2psoppaut_ord == 1 out of the number of rows in the group. If the rows in the group are the countries, then this would give you what you're looking for. Your data should look something like this for this to work:

df <- data.frame(year = c(rep(1900,3),rep(1901,3),rep(1902,3)), 
                 country_name = c(rep(c("Mexico", "Canada", "US"),3)), 
                 v2psoppaut_ord = c(sample(1:4,9,replace = T)))

CodePudding user response：

Trying using dplyr from tidyverse to group your data by year, then summarize it (aggregate) by taking the sum of rows where v2psoppaut_ord is equal to 1 divided by the total number of rows within that group (e.g. year) with the n() function. Save that to a new df for plotting. You will have two values: year and auth, with the latter indicating the proportion (multiply by 100 to get percentage) of countries with a value of 1 for the variable you indicated. Don't forget to ungroup the data with ungroup()

library(tidyverse)

plot_df <- df %>%
  group_by(year) %>%
  summarize(auth = sum(v2psoppaut_ord == 1, na.rm = T) / n()) %>%
  ungroup()