Home > Back-end >  Create new dataframe column in R that conditions on row values without iterating?
Create new dataframe column in R that conditions on row values without iterating?

Time:12-18

So let's say I have the following dataframe "df":

names <- c("Bob","Mary","Ben","Lauren")
number <- c(1:4)
age <- c(20,33,34,45)
df <- data.frame(names,number,age)

Let's say I have another dataframe ("df2") with thousands of people and I want to sum the income of people in that other dataframe that have the given name, number and age of each row in "df". That is, for each row "i" of "df", I want to create a fourth column "TotalIncome" that is the sum of the income of all the people with the given name, age and number in dataframe "df2". In other words, for each row "i":

df$TotalIncome[i] <- sum(
  df2$Income[df2$Name == df1$Name[i] &
  df2$Numbers == df1$Numbers[i] &
  df2$Age == df1$Age[i]], na.rm=TRUE)

Is there a way to do this without having to iterate in a for loop for each row "i" and perform the above code? Is there a way to use apply() to calculate this for the entire vector rather than only iterating each line individually? The actual dataset I am working with is huge and iterating takes quite a while and I am hoping there is a more efficient way to do this in R.

Thanks!

CodePudding user response:

Have you considered use dplyr package? You can use some grammar with SQL-style and make this job quick and easy.

The code will be something like

library(dplyr)

df %>% left_join(df2) %>%
    group_by(name, numbers, age) %>%
    summarize(TotalIncome = sum(Income))

I suggest you to find the cheat sheets available on dplyr site or see the Wickham and Grolemund book.

  • Related