I have the following data:
data <- data.frame(id_pers=c(1, 2, 3, 4, 5),
Birthyear=c(2018, 2009, 2008, 2000, 1998,2005),
family=c(Elliot, Elliot, Elliot, Gerrard, Gerrard,Gerrard)
I want to find the maximal difference (in birthyear) in each family, that is the same for all the family-members in the following.
It should look like:
datanew <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6),
Birthyear=c(2018, 2009, 2008, 2000, 1998, 2005),
family=c(Elliot, Elliot, Elliot, Gerrard, Gerrard, Gerrard),
maxdifference=c(10,10,10,7,7,7)
CodePudding user response:
Using tidyverse you can first group by family ID, then compute the distance via dist
and take the maximum max
.
library(tidyverse)
data <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6),
Birthyear=c(2018, 2009, 2008, 2000, 1998,2005),
family=c(1, 1, 1, 2, 2,2))
data %>% dplyr::group_by(family) %>%
dplyr::mutate(maxdifference = max(dist(Birthyear)))
# A tibble: 6 × 4
# Groups: family [2]
id_pers Birthyear family maxdifference
<dbl> <dbl> <dbl> <dbl>
1 1 2018 1 10
2 2 2009 1 10
3 3 2008 1 10
4 4 2000 2 7
5 5 1998 2 7
6 6 2005 2 7
CodePudding user response:
Another way is to take the diff
erence of the range
:
data %>%
group_by(family) %>%
mutate(maxdifference = diff(range(Birthyear)))
CodePudding user response:
data %>% group_by(family) %>% mutate(maxdifference = max(Birthyear)-min(Birthyear))
CodePudding user response:
Obligatory base-r one-liner
data$maxdifference = ave(data$Birthyear, data$family, FUN = \(years) max(years) - min(years))