I have a df in which each entry has an assigned grouped ID, like:
id_person income id_family
001 100 A1
002 120 B2
003 100 B5
004 220 A1
005 300 A2
And now I need to assign a new variable for each id_person, summing up the variable income
of all entries with the same id_family
. Such as:
id_person income id_family family_income
001 100 A1 220
002 120 B2 120
003 100 B5 400
004 220 A1 230
005 300 B5 400
I do not wish to create a new grouped_df
, neither do I need to see this variable summarized for now.
How can I give this command using R?
CodePudding user response:
You may try
Data
df <- read.table(text = "id_person income id_family
001 100 A1
002 120 B2
003 100 B5
004 220 A1
005 300 B5", header = T)
id_person income id_family
1 1 100 A1
2 2 120 B2
3 3 100 B5
4 4 220 A1
5 5 300 B5
Code
library(dplyr)
df %>%
group_by(id_family) %>%
mutate(family_income = sum(income)) %>%
ungroup
id_person income id_family family_income
<int> <int> <chr> <int>
1 1 100 A1 320
2 2 120 B2 120
3 3 100 B5 400
4 4 220 A1 320
5 5 300 B5 400
CodePudding user response:
df %>%
group_by(id_family) %>%
summarise(Total = sum(income, na.rm = TRUE))
or the across()
function could be used:
df %>%
group_by(id_family) %>%
summarise(across(income, sum))
CodePudding user response:
Using data in a data.table, we can perform operations on variables by a grouping variable (in by=
), then assign that back to the data using the data.table assignment operator :=
library(data.table)
setDT(df)
df[, "family_income" := sum(income), by = id_family]
The data.table data structure is a pumped up version of the R data.frame, giving added functionality and efficiency gains. If DT
is your data.table, DT[i, j, by]
is the notation showing how we can use i
to sort or subset data, j
for selecting or computing on variables, and by
to perfrom j
-operations on groups. For example, for cars with over 100 horsepower, what is the mean fuel efficiency for automatic (0) and manual (1) cars?
dtcars <- data.table(mtcars)
dtcars[hp>100, mean(mpg), by=am]
Returns:
> dtcars[hp>100, mean(mpg), by=am]
am V1
1: 1 20.61429
2: 0 16.06875