Home > Blockchain >  Adding backtick between numbers and single quotes between letters in strings
Adding backtick between numbers and single quotes between letters in strings

Time:01-14

I am trying to format a data dictionary for recoding with dplyr.

Dictionary format:

> df <- data.frame(var=c("gender", "treatment"), label=c("1=male, 2=female", "1=control, 2=experimental"))
> df
        var                     label
1    gender          1=male, 2=female
2 treatment 1=control, 2=experimental

I can't seem to find a way to add backticks around numbers and single quotes around character strings:

> end <- data.frame(var=c("gender", "treatment"), label=c("`1`='male', `2`='female'", "`1`='control', `2`='experimental'"))
> end
        var                             label
1    gender          `1`='male', `2`='female'
2 treatment `1`='control', `2`='experimental'

CodePudding user response:

You could do:

df %>% 
  mutate(label = gsub("(\\d)=([^,]*)", "`\\1`='\\2'", label))
#>         var                             label
#> 1    gender          `1`='male', `2`='female'
#> 2 treatment `1`='control', `2`='experimental'

CodePudding user response:

We may capture ((...)) the digits (\\d ) and the lower case words ([a-z] ) and replace with the corresponding backticks or single quotes around the backreference

df$label <- gsub("([a-z] )", "'\\1'", gsub("(\\d )", "`\\1`", df$label))

-output

df$label
[1] "`1`='male', `2`='female'"          "`1`='control', `2`='experimental'"

CodePudding user response:

You can do this with a set of 2 regular expressions. Each of which will extract and replace a specific portion of text with itself surrounded by the appropriate quotation marks:

library(tidyverse)

df %>%
    mutate(label = str_replace_all(label, "([0-9] )", "`\\1`"),
           label = str_replace_all(label, "([a-zA-Z] )", "'\\1'"))

        var                             label
1    gender          `1`='male', `2`='female'
2 treatment `1`='control', `2`='experimental'

[0-9] matches 1 number. By surrounding it in () we "capture" it, and can access it in the replacement step with \\1 since it's the first capture group

[a-zA-Z] matches 1 letters (either lower or upper case), so it will grab any single word. If you want to allow spaces, hyphens, etc. in that string, you can add them to the brackets there.

  • Related