Let's say I have the following dataset. And, I want to change the range of values starting from 20010001-20010010
to 2001-2010
.
How can I do this?
Sample data (df
):
structure(list(x = c(20010001, 20010001, 20010002, 20010002,
20010003, 20010003, 20010004, 20010004, 20010005, 20010005, 20010006,
20010006, 20010007, 20010007, 20010008, 20010008, 20010009, 20010009,
20010010, 20010010, 20, 2, 19, 18, 17, 16, 15, 14965, 14964
), y = c("2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001",
"ORIG", "2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001",
"ORIG", "2001", "ORIG", "2001", "ORIG", "2020", "2020", "2020",
"2020", "2020", "2020", "2020", "2022", "2022")), class = "data.frame", row.names = c(NA, -29L))
Code:
library(tidyverse)
# To change a single value at a time
df["1", "x"] = 2010
# Now how to do it for a range of values wihtout having to do it one by one?
CodePudding user response:
Another possible solution.
EXPLANATION
library(tidyverse)
df %>%
mutate(z = str_replace(x, "2001[0] (?=\\d{2}$)", "20")) %>%
type.convert(as.is = T)
#> x y z
#> 1 20010001 2001 2001
#> 2 20010001 ORIG 2001
#> 3 20010002 2001 2002
#> 4 20010002 ORIG 2002
#> 5 20010003 2001 2003
#> 6 20010003 ORIG 2003
#> 7 20010004 2001 2004
#> 8 20010004 ORIG 2004
#> 9 20010005 2001 2005
#> 10 20010005 ORIG 2005
#> 11 20010006 2001 2006
#> 12 20010006 ORIG 2006
#> 13 20010007 2001 2007
#> 14 20010007 ORIG 2007
#> 15 20010008 2001 2008
#> 16 20010008 ORIG 2008
#> 17 20010009 2001 2009
#> 18 20010009 ORIG 2009
#> 19 200100010 2001 2010
#> 20 200100010 ORIG 2010
#> 21 20 2020 20
#> 22 2 2020 2
#> 23 19 2020 19
#> 24 18 2020 18
#> 25 17 2020 17
#> 26 16 2020 16
#> 27 15 2020 15
#> 28 14965 2022 14965
#> 29 14964 2022 14964
CodePudding user response:
Perhaps there's more to it than this ...
library(dplyr)
df %>%
mutate(x2 = if_else(between(x, 20010001, 20010010), x - 20008000, x))
# x y x2
# 1 20010001 2001 2001
# 2 20010001 ORIG 2001
# 3 20010002 2001 2002
# 4 20010002 ORIG 2002
# 5 20010003 2001 2003
# 6 20010003 ORIG 2003
# 7 20010004 2001 2004
# 8 20010004 ORIG 2004
# 9 20010005 2001 2005
# 10 20010005 ORIG 2005
# 11 20010006 2001 2006
# 12 20010006 ORIG 2006
# 13 20010007 2001 2007
# 14 20010007 ORIG 2007
# 15 20010008 2001 2008
# 16 20010008 ORIG 2008
# 17 20010009 2001 2009
# 18 20010009 ORIG 2009
# 19 200100010 2001 200100010
# 20 200100010 ORIG 200100010
# 21 20 2020 20
# 22 2 2020 2
# 23 19 2020 19
# 24 18 2020 18
# 25 17 2020 17
# 26 16 2020 16
# 27 15 2020 15
# 28 14965 2022 14965
# 29 14964 2022 14964
CodePudding user response:
Here is an alternative approach using stringr
package:
The feature or kind of funny thing here is to use all functions from the stringr package. str_c
is equvalent to paste0
, str_sub is quasi equivalent to substr()
-> I find it easier to use in certain places like extracting the characters from last position. And thats it.
We extract the first letter in case x has more or equal to 8 characters and so we also extract the last 3 characters and paste them together. In case x has only for example 2 characters then x will be left:
library(dplyr)
library(stringr)
df %>%
mutate(x = ifelse(nchar(x) >= 8, str_c(str_sub(x, 1, 1), str_sub(x, - 3, - 1)), x))
x y
1 2001 2001
2 2001 ORIG
3 2002 2001
4 2002 ORIG
5 2003 2001
6 2003 ORIG
7 2004 2001
8 2004 ORIG
9 2005 2001
10 2005 ORIG
11 2006 2001
12 2006 ORIG
13 2007 2001
14 2007 ORIG
15 2008 2001
16 2008 ORIG
17 2009 2001
18 2009 ORIG
19 2010 2001
20 2010 ORIG
21 20 2020
22 2 2020
23 19 2020
24 18 2020
25 17 2020
26 16 2020
27 15 2020
28 14965 2022
29 14964 2022