R dplyr change numeric values in sequence of rows to other numeric values-CodePudding

Let's say I have the following dataset. And, I want to change the range of values starting from 20010001-20010010 to 2001-2010.

How can I do this?

Sample data (df):

structure(list(x = c(20010001, 20010001, 20010002, 20010002, 
20010003, 20010003, 20010004, 20010004, 20010005, 20010005, 20010006, 
20010006, 20010007, 20010007, 20010008, 20010008, 20010009, 20010009, 
20010010, 20010010, 20, 2, 19, 18, 17, 16, 15, 14965, 14964
), y = c("2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001", 
"ORIG", "2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001", 
"ORIG", "2001", "ORIG", "2001", "ORIG", "2020", "2020", "2020", 
"2020", "2020", "2020", "2020", "2022", "2022")), class = "data.frame", row.names = c(NA, -29L))

Code:

library(tidyverse)

# To change a single value at a time
df["1", "x"] = 2010

# Now how to do it for a range of values wihtout having to do it one by one?

CodePudding user response：

Another possible solution.

EXPLANATION

Regex demo

library(tidyverse)

df %>% 
  mutate(z = str_replace(x, "2001[0] (?=\\d{2}$)", "20")) %>% 
  type.convert(as.is = T)

#>            x    y     z
#> 1   20010001 2001  2001
#> 2   20010001 ORIG  2001
#> 3   20010002 2001  2002
#> 4   20010002 ORIG  2002
#> 5   20010003 2001  2003
#> 6   20010003 ORIG  2003
#> 7   20010004 2001  2004
#> 8   20010004 ORIG  2004
#> 9   20010005 2001  2005
#> 10  20010005 ORIG  2005
#> 11  20010006 2001  2006
#> 12  20010006 ORIG  2006
#> 13  20010007 2001  2007
#> 14  20010007 ORIG  2007
#> 15  20010008 2001  2008
#> 16  20010008 ORIG  2008
#> 17  20010009 2001  2009
#> 18  20010009 ORIG  2009
#> 19 200100010 2001  2010
#> 20 200100010 ORIG  2010
#> 21        20 2020    20
#> 22         2 2020     2
#> 23        19 2020    19
#> 24        18 2020    18
#> 25        17 2020    17
#> 26        16 2020    16
#> 27        15 2020    15
#> 28     14965 2022 14965
#> 29     14964 2022 14964

CodePudding user response：

Perhaps there's more to it than this ...

library(dplyr)
df %>%
  mutate(x2 = if_else(between(x, 20010001, 20010010), x - 20008000, x))
#            x    y        x2
# 1   20010001 2001      2001
# 2   20010001 ORIG      2001
# 3   20010002 2001      2002
# 4   20010002 ORIG      2002
# 5   20010003 2001      2003
# 6   20010003 ORIG      2003
# 7   20010004 2001      2004
# 8   20010004 ORIG      2004
# 9   20010005 2001      2005
# 10  20010005 ORIG      2005
# 11  20010006 2001      2006
# 12  20010006 ORIG      2006
# 13  20010007 2001      2007
# 14  20010007 ORIG      2007
# 15  20010008 2001      2008
# 16  20010008 ORIG      2008
# 17  20010009 2001      2009
# 18  20010009 ORIG      2009
# 19 200100010 2001 200100010
# 20 200100010 ORIG 200100010
# 21        20 2020        20
# 22         2 2020         2
# 23        19 2020        19
# 24        18 2020        18
# 25        17 2020        17
# 26        16 2020        16
# 27        15 2020        15
# 28     14965 2022     14965
# 29     14964 2022     14964

CodePudding user response：

Here is an alternative approach using stringr package:

The feature or kind of funny thing here is to use all functions from the stringr package. str_c is equvalent to paste0, str_sub is quasi equivalent to substr() -> I find it easier to use in certain places like extracting the characters from last position. And thats it. We extract the first letter in case x has more or equal to 8 characters and so we also extract the last 3 characters and paste them together. In case x has only for example 2 characters then x will be left:

library(dplyr)
library(stringr)

df %>% 
  mutate(x = ifelse(nchar(x) >= 8, str_c(str_sub(x, 1, 1), str_sub(x, - 3, - 1)), x))

       x    y
1   2001 2001
2   2001 ORIG
3   2002 2001
4   2002 ORIG
5   2003 2001
6   2003 ORIG
7   2004 2001
8   2004 ORIG
9   2005 2001
10  2005 ORIG
11  2006 2001
12  2006 ORIG
13  2007 2001
14  2007 ORIG
15  2008 2001
16  2008 ORIG
17  2009 2001
18  2009 ORIG
19  2010 2001
20  2010 ORIG
21    20 2020
22     2 2020
23    19 2020
24    18 2020
25    17 2020
26    16 2020
27    15 2020
28 14965 2022
29 14964 2022