Home > Net >  How to add a new column with values specific to grouped variables
How to add a new column with values specific to grouped variables

Time:04-21

I'm new to R and have found similar solutions to my problem, but I'm struggling to apply these to my code. Please help...

These data are simplified, as the id variables are many:

df = data.frame(id = rep(c("a_10", "a_11", "b_10", "b_11"), each = 5),
           site = rep(1:5, 4),
           value = sample(1:20))

The aim is to add another column labelled "year" with values that are grouped by "id" but the true names are many - so I'm trying to simplify the code by using the ending digits.

I can use dplyr to split the dataframe into each id variable using this code (repeated for each id variable):

df %>% 
  select(site, id, value) %>% 
  filter(grepl("10$", id)) %>% 
  mutate(Year = "2010")`

Rather than using merge to re-combine the dataframes back into one, is there not a more simple method?

I tried modifying case_when with mutate as described in a previous answer:

[https://stackoverflow.com/a/63043920/12313457][1]

mutate(year = case_when(grepl(c("10$", "11$", id) == c("2010", "2011"))))

is something like this possible??

Thanks in advance

CodePudding user response:

You can use substr to get the final two digits of id and then paste0 this to "20" to recreate the year.

df |> dplyr::mutate(Year = paste0("20", substr(id, 3, 4)))
#>      id site value Year
#> 1  a_10    1     5 2010
#> 2  a_10    2    12 2010
#> 3  a_10    3     9 2010
#> 4  a_10    4     7 2010
#> 5  a_10    5    13 2010
#> 6  a_11    1     3 2011
#> 7  a_11    2     4 2011
#> 8  a_11    3    16 2011
#> 9  a_11    4     2 2011
#> 10 a_11    5     6 2011
#> 11 b_10    1    19 2010
#> 12 b_10    2    14 2010
#> 13 b_10    3    15 2010
#> 14 b_10    4    10 2010
#> 15 b_10    5    11 2010
#> 16 b_11    1    18 2011
#> 17 b_11    2     1 2011
#> 18 b_11    3    20 2011
#> 19 b_11    4    17 2011
#> 20 b_11    5     8 2011

Created on 2022-04-21 by the reprex package (v2.0.1)

CodePudding user response:

In case your id column has different string lengths you can use sub:

df %>%
  mutate(Year = paste0("20", sub('^.*_(\\d )$', '\\1', id)))
#>      id site value Year
#> 1  a_10    1     2 2010
#> 2  a_10    2     7 2010
#> 3  a_10    3    16 2010
#> 4  a_10    4    10 2010
#> 5  a_10    5    11 2010
#> 6  a_11    1     5 2011
#> 7  a_11    2    13 2011
#> 8  a_11    3    14 2011
#> 9  a_11    4     6 2011
#> 10 a_11    5    12 2011
#> 11 b_10    1    17 2010
#> 12 b_10    2     1 2010
#> 13 b_10    3     4 2010
#> 14 b_10    4    15 2010
#> 15 b_10    5     9 2010
#> 16 b_11    1     8 2011
#> 17 b_11    2    20 2011
#> 18 b_11    3    19 2011
#> 19 b_11    4    18 2011
#> 20 b_11    5     3 2011

Created on 2022-04-21 by the reprex package (v2.0.1)

  • Related