Home > OS >  Calculating row sums in data frame based on column names
Calculating row sums in data frame based on column names

Time:09-02

I have a data frame with media spending for different media channels:

TV <- c(200,500,700,1000)
Display <- c(30,33,47,55)
Social <- c(20,21,22,23)
Facebook <- c(30,31,32,33)
Print <- c(50,51,52,53)
Newspaper <- c(60,61,62,63)

df_media <- data.frame(TV,Display,Social,Facebook, Print, Newspaper)

My goal is to calculate the row sums of specific columns based on their name. For example: Per definition Facebook falls into the category of Social, so I want to add the Facebook column to the Social column and just have the Social column left. The same goes for Newspaper which should be added to Print and so on.

The challenge is that the names and the number of columns that belong to one category change from data set to data set, e.g. the next data set could contain Social, Facebook and Instagram which should be all summed up to Social.

There is a list of rules, which define which media types (column names) belong to each other, but I have to admit that I'm a bit clueless and can only think about a long set of if commands right now, but I hope there is a better solution.

I'm thinking about putting all the names that belong to each other in vectors and use them to find and summarize the relevant columns, but I have no idea, how to execute this. Any help is appreciated.

CodePudding user response:

You could something along those lines, which allows columns to not be part of every data set (with intersect and setdiff).

  1. Define a set of rules, i.e. those columns that are going to be united/grouped together.

  2. Create a vector d of the remaining columns

  3. Compute the rowSums of every subset of the data set defined in the rules

  4. append the remaining columns

  5. cbind the columns of the list using do.call.

#Rules
rules = list(social = c("Social", "Facebook", "Instagram"),
             printed = c("Print", "Newspaper"))
d <- setdiff(colnames(df_media), unlist(rules)) #columns that are not going to be united

#data frame 
lapply(rules, function(x) rowSums(df_media[, intersect(colnames(df_media), x)])) |>
  append(df_media[, d]) |>
  do.call(cbind.data.frame, args = _)
  social printed   TV Display
1     50     110  200      30
2     52     112  500      33
3     54     114  700      47
4     56     116 1000      55
  • Related