I have a table (called money_table) with many columns with a financial year suffix in their name:
ID LunchMoney_1213 DinnerMondey_1213 LunchMoney_1314 DinnerMondy_1314
01 12 24 17 18
02 234 12 43 44
03 14 19 2 12
I need to make new columns adding the LunchMoney and DinnerMoney amounts for the relevant year, and delete the old columns. The idea is to end up with this:
ID TotalMoney_1213 TotalMoney_1314
01 36 35
02 246 87
03 33 14
I am using data.table because the table is very large, and I can do what I want using the following code:
money_table[,':='(TotalMoney_1213 = LunchMoney_1213 DinnerMoney_1213,
TotalMoney_1314 = LunchMoney_1314 DinnerMoney_1314)][,c(LunchMoney_1213,DinnerMoney_1213, LunchMoney_1314, DinnerMoney_1314) := NULL]
But there are many years, so writing it all out like this would take too long. I know there must be a way to use the numbers in the column names and do this more efficiently, but I haven't been able to figure it out.
Any help much appreciated.
CodePudding user response:
I think pivoting (longer), summarizing, then unpivoting (wider) works. (I wonder if keeping it in long form might be better in the long run, over to you.)
library(data.table)
money_table <- setDT(structure(list(ID = 1:3, LunchMoney_1213 = c(12L, 234L, 14L), DinnerMondey_1213 = c(24L, 12L, 19L), LunchMoney_1314 = c(17L, 43L, 2L), DinnerMondy_1314 = c(18L, 44L, 12L)), row.names = c(NA, -3L), class = "data.frame"))
dcast(
melt(money_table, id.vars = "ID"
)[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))
][, .(value = sum(value)), by = .(ID, yr)
],
ID ~ yr, value.vars = "value")
# ID TotalMoney_1213 TotalMoney_1314
# <int> <int> <int>
# 1: 1 36 35
# 2: 2 246 87
# 3: 3 33 14
If you're already using magrittr
for other things (whether with dplyr
or not ... I use it with data.table
all the time), this can be slightly more readable as:
library(magrittr)
melt(money_table, id.vars = "ID") %>%
.[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))] %>%
.[, .(value = sum(value)), by = .(ID, yr)] %>%
dcast(., ID ~ yr, value.vars = "value")
CodePudding user response:
We can try data.table
with split.default
> dt[, lapply(split.default(.SD, paste0("TotalMoney_", gsub(".*_", "", names(.SD)))), sum), ID]
ID TotalMoney_1213 TotalMoney_1314
1: 1 36 35
2: 2 246 87
3: 3 33 14
CodePudding user response:
I suggest transforming your dataframe to tidy format where each column is a variable:
money_table %>%
gather("key", "value", c(-ID)) %>% # Wide -> Long format
separate(key, into = c("type", "year"), sep = "_") %>% # Split what used to be colum names into type and year
spread(type, value) %>% # DinnerMoney and LunchMoney are now two separate variables with values for each year
group_by(ID, year) %>% # Group by ID and year
summarize(DinnerMoney = sum(DinnerMoney), # Sum up DinnerMoney for each year as well as LunchMoney for each year
LunchMoney = sum(LunchMoney)) %>%
mutate(total_value = DinnerMoney LunchMoney) # Get total value for each year
# A tibble: 6 x 5
# Groups: ID [3]
ID year DinnerMoney LunchMoney total_value
<int> <chr> <int> <int> <int>
1 1 1213 24 12 36
2 1 1314 18 17 35
3 2 1213 12 234 246
4 2 1314 44 43 87
5 3 1213 19 14 33
6 3 1314 12 2 14
If you want total values for each years as columns, you can pivot this table like this:
money_table %>%
select(ID, year, total_value) %>%
spread(year, total_value)
# A tibble: 3 x 3
# Groups: ID [3]
ID `1213` `1314`
<int> <int> <int>
1 1 36 35
2 2 246 87
3 3 33 14