I have a dataframe with the following sample:
df = data.frame(x1 = c(2000a,2010a,2000b,2010b,2000c,2010c),
x2 = c(1,2,3,4,5,6))
I am trying to find a way to calculate the percent change for each "group" (a,b,c) using the change()
function. Below is my attempt:
percent_change = change(df,x2, NewVar = "percent_change", slideBy = 1,type = 'percent')
where slideBy is the lag variable that restarts the percent change calculation every other observation. This does not work, and I get the following error:
" Remember to put data in time order before running.
Leading total_units by 1 time units."
Would it be possible to adapt my x1 column to a time series or is there an easier way around this I am missing?
Thank you!
CodePudding user response:
This uses the data.table structure from the data.table package. First it sorts on x1, then does a row by row calculation of the percent change, grouping by the letter in x1.
library(data.table)
setDT(df)
df[order(x1),
100*x2/shift(x2,1L),
keyby=gsub("[0-9]","",x1)]
CodePudding user response:
Here is a tidyverse way to do this. First, use extract to separate x1 into year and group, then pivot_wider on the table. Now you can use mutate to create the percent change row.
library(dplyr)
library(tidyr)
df = data.frame(x1 = c("2000a","2010a","2000b","2010b","2000c","2010c"),x2 = c(1,2,3,4,5,6))
df_new = df %>%
extract(x1, c("year", "group"),regex="(\\d{4})(\\D{1})") %>%
pivot_wider(names_from = year, values_from=x2) %>%
mutate(percent_change=(`2010`-`2000`)/`2000`)