I have a df
that looks like this:
group year
1 2020
1 NA
1 0
2 2021
2 2006
3 NA
3 0
3 2010
3 2010
4 2006
4 2005
4 2010
And I want to group by group
and then find the minimum year while ignoring NAs and 0 entries:
group year minYr
1 2020 2020
1 NA 2020
1 0 2020
2 2021 2006
2 2006 2006
3 NA 2010
3 0 2010
3 2010 2010
3 2010 2010
4 2006 2005
4 2005 2005
4 2010 2005
My initial approach
df <- df %>% group_by(group) %>% mutate (minYr = min(year, na.rm = TRUE))
caused a runtime error and didn't take care of the zeros.
Does anyone have a better way of doing this?
CodePudding user response:
df1 %>%
group_by(group) %>%
mutate(minYr = min(year[year > 0], na.rm = TRUE)) %>%
ungroup()
# A tibble: 12 × 3
group year minYr
<dbl> <dbl> <dbl>
1 1 2020 2020
2 1 NA 2020
3 1 0 2020
4 2 2021 2006
5 2 2006 2006
6 3 NA 2010
7 3 0 2010
8 3 2010 2010
9 3 2010 2010
10 4 2006 2005
11 4 2005 2005
12 4 2010 2005
df1 <- structure(list(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
year = c(2020, NA, 0, 2021, 2006, NA, 0, 2010, 2010, 2006, 2005, 2010)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))
CodePudding user response:
I'm sure there would be a way to do this in one workflow, but here is one approach...
library(dplyr, warn = FALSE)
df_min <-
df1 |>
filter(year != 0, !is.na(year)) |>
group_by(group) |>
mutate(minYr = min(year)) |>
ungroup() |>
select(-year) |>
distinct()
df1 |>
left_join(df_min)
#> Joining, by = "group"
#> # A tibble: 12 × 3
#> group year minYr
#> <dbl> <dbl> <dbl>
#> 1 1 2020 2020
#> 2 1 NA 2020
#> 3 1 0 2020
#> 4 2 2021 2006
#> 5 2 2006 2006
#> 6 3 NA 2010
#> 7 3 0 2010
#> 8 3 2010 2010
#> 9 3 2010 2010
#> 10 4 2006 2005
#> 11 4 2005 2005
#> 12 4 2010 2005
data
df1 <- structure(list(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
year = c(2020, NA, 0, 2021, 2006, NA, 0, 2010, 2010, 2006, 2005, 2010)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))
Created on 2022-10-29 with reprex v2.0.2