I am trying to format a data dictionary for recoding with dplyr.
Dictionary format:
> df <- data.frame(var=c("gender", "treatment"), label=c("1=male, 2=female", "1=control, 2=experimental"))
> df
var label
1 gender 1=male, 2=female
2 treatment 1=control, 2=experimental
I can't seem to find a way to add backticks around numbers and single quotes around character strings:
> end <- data.frame(var=c("gender", "treatment"), label=c("`1`='male', `2`='female'", "`1`='control', `2`='experimental'"))
> end
var label
1 gender `1`='male', `2`='female'
2 treatment `1`='control', `2`='experimental'
CodePudding user response:
You could do:
df %>%
mutate(label = gsub("(\\d)=([^,]*)", "`\\1`='\\2'", label))
#> var label
#> 1 gender `1`='male', `2`='female'
#> 2 treatment `1`='control', `2`='experimental'
CodePudding user response:
We may capture ((...)
) the digits
(\\d
) and the lower case words ([a-z]
) and replace with the corresponding backticks or single quotes around the backreference
df$label <- gsub("([a-z] )", "'\\1'", gsub("(\\d )", "`\\1`", df$label))
-output
df$label
[1] "`1`='male', `2`='female'" "`1`='control', `2`='experimental'"
CodePudding user response:
You can do this with a set of 2 regular expressions. Each of which will extract and replace a specific portion of text with itself surrounded by the appropriate quotation marks:
library(tidyverse)
df %>%
mutate(label = str_replace_all(label, "([0-9] )", "`\\1`"),
label = str_replace_all(label, "([a-zA-Z] )", "'\\1'"))
var label
1 gender `1`='male', `2`='female'
2 treatment `1`='control', `2`='experimental'
[0-9]
matches 1 number. By surrounding it in ()
we "capture" it, and can access it in the replacement step with \\1
since it's the first capture group
[a-zA-Z]
matches 1 letters (either lower or upper case), so it will grab any single word. If you want to allow spaces, hyphens, etc. in that string, you can add them to the brackets there.