I have several large data frames that contain one column (we can call ittimeperiod
) with variables in it that are text strings. All of the variables end in specific strings (like V.1to2
or V.2to3
) but the beginnings are different I want the values with the same endings to be changed to different values. Here is an example:
With a data frame like this:
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
Looking like this:
Location timeperiod
1 a A.V.1to2
2 b D.V.1to2
3 c A.V.1to2
4 d D.V.2to3
5 e A.V.3to4
6 f H.V.3to4
7 g A.V.4to5
8 h D.V.4to5
My expected/hoped for output would look like this:
df2
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
df2 <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c(1, 1, 1, 2, 3, 3, 4, 4))
I know about:
df$timeperiod[df$timeperiod =="A.V.1to2"] <- "1"
But because of the size of my data set and because I need to repeat this for multiple data frames that are not consistent in the prefix for the timeperiod values I would like to use something like this with dplyr:
library(dplyr)
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.1to2)="1"))
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.2to3)="2"))
#etc..
So that I can repeat the process over many different values and across many different sheets. This doesn't work though and even this seems inefficient so any solution that is faster than renaming every specific value would be sufficient.
Thanks for any help.
CodePudding user response:
We could use str_extract
:
library(dplyr)
library(stringr)
df %>%
mutate(timeperiod = str_extract(timeperiod, '\\d '))
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
CodePudding user response:
We can use dplyr, and stringr. First extract the last 6 characters of timeperiod
. Then, group_by
timeperiod, and finally use cur_group_id
library(dplyr)
library(stringr)
df %>% mutate(timeperiod = str_extract(timeperiod, '.{6}$'))%>%
group_by(timeperiod)%>%
mutate(timeperiod = cur_group_id())%>%
ungroup()
# A tibble: 8 × 2
Location timeperiod
<chr> <int>
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
CodePudding user response:
Maybe this is what you are looking for
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
df$timeperiod <- substr(gsub('[[:alpha:]]|[[:punct:]]', '', df$timeperiod), 1, 1)
df
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4