Home > Mobile >  How can I decode a column with text from another column in R?
How can I decode a column with text from another column in R?

Time:03-16

I have a dataframe with encoded survey answers in the answer column und the keys as one string in a character column:

df <- data.frame(answer = c(1, 2, 1, 3, 1),
                 key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
                         "1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
                         "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"))

print(df)

  answer                                            key
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"

How can I decode the answer column with the data from the key column so that I get this result?

df_result <- data.frame(answer = c(1, 2, 1, 3, 1),
                 key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
                         "1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
                         "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"),
                 answer_decoded = c("Answer One", "Answer DEF", "Answer abc", "Answer Three","Answer ABC"))

print(df_result)

  answer                                            key answer_decoded
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"     "Answer One"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer DEF"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"     "Answer abc"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"   "Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer ABC"

I cannot use factor labels since I have too many different items to manually create them.

CodePudding user response:

We may extract the substring based on the 'answer' values - use str_c to create the pattern to be extracted i.e. paste the 'answer' with space followed by = and one or more non-digit characters (\\D ) and remove the prefix part including the = and any spaces with trimws

library(stringr)
library(dplyr)
df %>%
   mutate(answer_decoded = trimws(str_extract(key, 
        str_c(answer, ' = \\D ')), whitespace = ".*=\\s |\\s "))

-output

  answer                                            key answer_decoded
1      1 1 = Answer One 2 = Answer Two 3 = Answer Three     Answer One
2      2   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer DEF
3      1   1 = Answer abc 2 = Answer def 3 = Answer ghi     Answer abc
4      3 1 = Answer One 2 = Answer Two 3 = Answer Three   Answer Three
5      1   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer ABC

CodePudding user response:

strsplit each string on the N = bit, then select [ the nth string ( 1 because of the way the split works):

mapply(`[`, strsplit(df$key, "(\\s*)\\d = "), df$answer   1)
#[1] "Answer One"   "Answer DEF"   "Answer abc"   "Answer Three" "Answer ABC"  
  •  Tags:  
  • r
  • Related