Min() ignoring zeros and NA with dplyr-CodePudding

I have a df that looks like this:

And I want to group by group and then find the minimum year while ignoring NAs and 0 entries:

group year minYr
1     2020 2020
1     NA   2020
1     0    2020
2     2021 2006
2     2006 2006
3     NA   2010
3     0    2010
3     2010 2010
3     2010 2010
4     2006 2005
4     2005 2005
4     2010 2005

My initial approach df <- df %>% group_by(group) %>% mutate (minYr = min(year, na.rm = TRUE)) caused a runtime error and didn't take care of the zeros.

Does anyone have a better way of doing this?

CodePudding user response：

df1 %>%
  group_by(group) %>% 
  mutate(minYr = min(year[year > 0], na.rm = TRUE)) %>%
  ungroup()

# A tibble: 12 × 3
   group  year minYr
   <dbl> <dbl> <dbl>
 1     1  2020  2020
 2     1    NA  2020
 3     1     0  2020
 4     2  2021  2006
 5     2  2006  2006
 6     3    NA  2010
 7     3     0  2010
 8     3  2010  2010
 9     3  2010  2010
10     4  2006  2005
11     4  2005  2005
12     4  2010  2005

df1 <- structure(list(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4), 
                      year = c(2020, NA, 0, 2021, 2006, NA, 0, 2010, 2010, 2006, 2005, 2010)), 
                 class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))

CodePudding user response：

I'm sure there would be a way to do this in one workflow, but here is one approach...

library(dplyr, warn = FALSE)

df_min <- 
  df1 |> 
  filter(year != 0, !is.na(year)) |> 
  group_by(group) |> 
  mutate(minYr = min(year)) |>
  ungroup() |> 
  select(-year) |> 
  distinct()

df1 |> 
  left_join(df_min)
#> Joining, by = "group"
#> # A tibble: 12 × 3
#>    group  year minYr
#>    <dbl> <dbl> <dbl>
#>  1     1  2020  2020
#>  2     1    NA  2020
#>  3     1     0  2020
#>  4     2  2021  2006
#>  5     2  2006  2006
#>  6     3    NA  2010
#>  7     3     0  2010
#>  8     3  2010  2010
#>  9     3  2010  2010
#> 10     4  2006  2005
#> 11     4  2005  2005
#> 12     4  2010  2005

data

df1 <- structure(list(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4), 
                      year = c(2020, NA, 0, 2021, 2006, NA, 0, 2010, 2010, 2006, 2005, 2010)), 
                 class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))

^{Created on 2022-10-29 with reprex v2.0.2}