How to add a new column with values specific to grouped variables-CodePudding

I'm new to R and have found similar solutions to my problem, but I'm struggling to apply these to my code. Please help...

These data are simplified, as the id variables are many:

df = data.frame(id = rep(c("a_10", "a_11", "b_10", "b_11"), each = 5),
           site = rep(1:5, 4),
           value = sample(1:20))

The aim is to add another column labelled "year" with values that are grouped by "id" but the true names are many - so I'm trying to simplify the code by using the ending digits.

I can use dplyr to split the dataframe into each id variable using this code (repeated for each id variable):

df %>% 
  select(site, id, value) %>% 
  filter(grepl("10$", id)) %>% 
  mutate(Year = "2010")`

Rather than using merge to re-combine the dataframes back into one, is there not a more simple method?

I tried modifying case_when with mutate as described in a previous answer:

[https://stackoverflow.com/a/63043920/12313457][1]

mutate(year = case_when(grepl(c("10$", "11$", id) == c("2010", "2011"))))

is something like this possible??

Thanks in advance

CodePudding user response：

You can use substr to get the final two digits of id and then paste0 this to "20" to recreate the year.

df |> dplyr::mutate(Year = paste0("20", substr(id, 3, 4)))
#>      id site value Year
#> 1  a_10    1     5 2010
#> 2  a_10    2    12 2010
#> 3  a_10    3     9 2010
#> 4  a_10    4     7 2010
#> 5  a_10    5    13 2010
#> 6  a_11    1     3 2011
#> 7  a_11    2     4 2011
#> 8  a_11    3    16 2011
#> 9  a_11    4     2 2011
#> 10 a_11    5     6 2011
#> 11 b_10    1    19 2010
#> 12 b_10    2    14 2010
#> 13 b_10    3    15 2010
#> 14 b_10    4    10 2010
#> 15 b_10    5    11 2010
#> 16 b_11    1    18 2011
#> 17 b_11    2     1 2011
#> 18 b_11    3    20 2011
#> 19 b_11    4    17 2011
#> 20 b_11    5     8 2011

^{Created on 2022-04-21 by the reprex package (v2.0.1)}

CodePudding user response：

In case your id column has different string lengths you can use sub:

df %>%
  mutate(Year = paste0("20", sub('^.*_(\\d )$', '\\1', id)))
#>      id site value Year
#> 1  a_10    1     2 2010
#> 2  a_10    2     7 2010
#> 3  a_10    3    16 2010
#> 4  a_10    4    10 2010
#> 5  a_10    5    11 2010
#> 6  a_11    1     5 2011
#> 7  a_11    2    13 2011
#> 8  a_11    3    14 2011
#> 9  a_11    4     6 2011
#> 10 a_11    5    12 2011
#> 11 b_10    1    17 2010
#> 12 b_10    2     1 2010
#> 13 b_10    3     4 2010
#> 14 b_10    4    15 2010
#> 15 b_10    5     9 2010
#> 16 b_11    1     8 2011
#> 17 b_11    2    20 2011
#> 18 b_11    3    19 2011
#> 19 b_11    4    18 2011
#> 20 b_11    5     3 2011

^{Created on 2022-04-21 by the reprex package (v2.0.1)}