I'm working through a task where I need to bind a few survey datasets, but unfortunately the survey questions are inconsistently numbered (wording is consistent). To solve this, I want to drop the question number from the start of each question.
Currently I am doing this manually with rename()
, but it is time consuming to repeat for every question across each dataset. Any tips to do this in a quicker, more efficient way?
Here's an example dataset and my current process:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021) %>%
rename(`First Question` = `1. First Question`,
`Second Question` = `2. Second Question`,
`Third Question` = `3. Third Question`)
df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Third Question` = c(2, 1, 3, 1, 2),
`3. Second Question` = c(2, 2, 1, 3, 2),
Year = 2022) %>%
rename(`First Question` = `1. First Question`,
`Second Question` = `3. Second Question`,
`Third Question` = `2. Third Question`)
end_df <- rbind(df1, df2)
CodePudding user response:
You can use rename_with
, which uses a function, here sub
, to change the column names based on a regex pattern:
df1 %>%
rename_with(~ sub("^X\\d\\.\\.", "", .))
ID First.Question Second.Question Third.Question Year
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
As noted by @zephryl you can do it in one go for a list
of all your dataframes:
list(df1, df2) %>%
map(rename_with, ~ sub("^X\\d\\.\\.", "", .))
Data:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021)
df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Third Question` = c(2, 1, 3, 1, 2),
`3. Second Question` = c(2, 2, 1, 3, 2),
Year = 2022)
CodePudding user response:
Using a regex with stringr::str_remove()
inside dplyr::rename_with()
for each dataframe:
library(purrr)
library(dplyr)
library(stringr)
list(df1, df2) %>%
map(rename_with, ~ str_remove(.x, "^\\d\\.\\s")) %>%
bind_rows()
# A tibble: 10 × 5
ID `First Question` `Second Question` `Third Question` Year
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
6 1 a 2 2 2022
7 2 b 2 1 2022
8 3 c 1 3 2022
9 4 d 3 1 2022
10 5 e 2 2 2022
CodePudding user response:
A base R alternative
colnames(df1)[2:4] <- sub("^[0-9]\\. ", "", colnames(df1)[2:4])
colnames(df2)[2:4] <- sub("^[0-9]\\. ", "", colnames(df2)[2:4])
rbind(df1, df2)
ID First Question Second Question Third Question Year
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
6 1 a 2 2 2022
7 2 b 2 1 2022
8 3 c 1 3 2022
9 4 d 3 1 2022
10 5 e 2 2 2022
Important side note. Create the data frames with check.names = F
, otherwise names get substituted with something like this X1..First.Question
etc.
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021, check.names = F)