I have a dataframe where some of the columns are named as dates. For example, something like this:
df_1 <- data_frame("id" = c('a','b','c','d'),
"gender" = c('m','f','f','m'),
"05/16/2017" = c(1,2,3,4),
"11/08/2016" = c(1,2,3,4),
"08/15/2016" = c(1,2,3,4))
df_1
# A tibble: 4 x 5
id gender `05/16/2017` `11/08/2016` `08/15/2016`
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
For the columns that are currently dates, in the format mm/dd/yyyy
, i would like to extract the mm
and yyyy
components and use these to rename the columns to election_yyyy_mm
. I.e. i would end up with df that looks like this:
df_2 <- data_frame("id" = c('a','b','c','d'),
"gender" = c('m','f','f','m'),
"election_2017_05" = c(1,2,3,4),
"election_2016_11" = c(1,2,3,4),
"election_2016_08" = c(1,2,3,4))
df_2
# A tibble: 4 x 5
id gender election_2017_05 election_2016_11 election_2016_08
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
I think I have a partial solution involving stringr
, but currently I have to run str_extract
twice to get the mm
and the yyyy
components respectively. I'm also not sure how I can pass a vector to rename()
.
These are the two snippets I have so far:
stringr::str_extract(c("05/16/2017", "11/08/2016", "08/15/2016"), "^[^/] ")
[1] "05" "11" "08"
stringr::str_extract(c("05/16/2017", "11/08/2016", "08/15/2016"), "[0-9]{4}")
[1] "2017" "2016" "2016"
Can anyone help me a) extract both elements (the yyyy
and mm
bits) in one call to str_extract
(or some other function), and b) pass the resulting vector to rename
?
CodePudding user response:
We can use rename_with
to rename with a function.
Inside the renaming function, we can first parse the characters as dates with mdy()
, then extract the month()
and year()
. Finally, glue()
the elements back together.
library(dplyr)
library(glue)
library(lubridate)
df_1 %>% rename_with( ~glue('election_{year(mdy(.x))}_{month(mdy(.x))}'),
matches("\\d{2}/\\d{2}/\\d{4}"))
output
# A tibble: 4 × 5
id gender election_2017_5 election_2016_11 election_2016_8
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
We can also use stringr::string_extract_all
to work on vectors instead of single character elements. Using a modified regex from the OPs attempt, we can extract both month and year in a single call. Just extract either (|
) the digits(\\d
) from the beginning (^
) or end ($
) of the string: "^\\d |\\d $"
.
The answer would be like this:
df_1 %>% rename_with( ~stringr::str_extract_all(.x, "^\\d |\\d $") %>%
map_chr(~glue('election_{.x[2]}_{.x[1]}')),
matches("\\d{2}/\\d{2}/\\d{4}"))
CodePudding user response:
Using tidyverse (dplyr and stringr), we can rename the columns like this:
library(dplyr)
df_1 %>%
rename_with(
.cols = contains("/"), # selects only the date columns
~ paste0(
"election_",
stringr::str_sub(.x, -4, -1), # last 4 digits/letters
"_",
stringr::str_sub(.x, 1, 2) # first 2 digits/letters
)
)
Result:
# A tibble: 4 x 5
id gender election_2017_05 election_2016_11 election_2016_08
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
CodePudding user response:
Here's a one-liner using regex:
names(df_1) <- sub("(\\d ).*?(\\d )$", "election_\\2_\\1", names(df_1))
How this works: First, you divide the column names into two capture groups:
(\\d )
: the first capture group, captured first two digits.*?
anything thereafter until ...(\\d )$
: ... the second capture group, capturing the last digits.
Then, using sub
's replacment argument, you add the string election_
to the matching names and refer back to the two capture groups in reversed order using backreferences \\1
and \\2
.
Using stringr
:
library(stringr)
names(df_1) <- str_replace(names(df_1), "(\\d ).*?(\\d )$", "election_\\2_\\1")
Result:
df_1
# A tibble: 4 × 5
id gender election_2017_05 election_2016_11 election_2016_08
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
CodePudding user response:
Here is an alternative approach:
library(dplyr)
library(stringr)
df_1 %>%
rename_with(~str_c('election',str_sub(.x, -4,-1),str_sub(.x,-10,-9), sep = "_"), where(is.numeric))
id gender election_2017_05 election_2016_11 election_2016_08
<chr> <chr> <dbl> <dbl> <dbl>
1 a m 1 1 1
2 b f 2 2 2
3 c f 3 3 3
4 d m 4 4 4
CodePudding user response:
Another approach with dplyr
but without stringr
.
Here using rename_with
to select out columns with /
, splitting the strings on /
and using sapply to concatenate the result of the split back together as a vector that can be used for renaming.
df_1 %>%
rename_with(.cols = contains('/'),
~ strsplit(.x, '/') %>%
sapply(
function(x) paste0('election_',x[3],'_',x[2]),
simplify=TRUE)
)
Edited to remove as.character
calls as explained by @GuedesBF in the comments.