Although this may seem like a straightforward task to some, as a beginner in R it has been frustrating! The task is as follows. I have a table with the following columns:
colnames(gov_data)
[1] "year" "quarter" "employed"
[4] "newhires" "separations" "jobscreated"
[7] "jobsdestroyed" "state" "mw"
[10] "teen_wage" "teen_pop" "adult_wage"
[13] "teen_share_working" "unemp_primemale" "recession"
[16] "period"
Using state_list<-split(gov_data, gov_data$state)
I now have a list of data.tables corresponding to each state. Within each of these data.tables, I want to order by date. Here is how I did that. If this is inefficient, I welcome your alternatives!
orderfun <- function (x) {
x[order(period)]
}
lapply(state_list, orderfun)
I now want to add a column labeled "change_mw" which corresponds to the change in the "mw" column. I know how to do that to a single data.table. I would create a column that lags so its the value of "mw" in t-1 and then take the difference between those two columns:
one_table[,`:=` (mw_t_minus_1 = shift(mw,n=1,type="lag"), change_mw = mw - mw_t_minus_1) ][, mw_t_minus_1 = NULL ]
How can I do this across multiple data.tables in a list? Is it even possible to use the data.table [i,j,by] in this instance? How would you go about this task? Once again, your help is very much appreciated!
CodePudding user response:
Here is an example that does similar, I'd be able to get closer with proper demo data
library(data.table)
dtCars <- data.table(mtcars, keep.rownames=TRUE)
dtCars[order(hp), change:= hp-shift(hp), by=cyl]