Home > OS >  How to remove strings from a cell in R while keeping other numeric values?
How to remove strings from a cell in R while keeping other numeric values?

Time:08-03

I'm trying to clean survey data in R and want to isolate numeric values from a longer string:

373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10

I'd like to remove all values beginning with Q as well as the vertical bars while retaining the numbers such that my output would be:

373, 470, 740, 753 ...

Any advice on how best to do this is greatly appreciated!

CodePudding user response:

Try this

gsub("\\|" , ", ", gsub("\\|Q\\d Item\\d \n*" , "" , x))

  • output
[1] "373, 470, 740, 753, 543, 1021, 174, 995, 696, 204"
  • data
x <- "373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10"

CodePudding user response:

I was looking at it a little differently. data

x <- "373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10"

Where a key:value pair seems developing

apply(matrix(rev(unlist(strsplit(stringr::str_extract_all(x, ('\\d{3,4}\\|Q\\d Item\\d '))[[1]], '|', fixed = TRUE))), ncol=2, byrow = TRUE), 2, rev) 
      [,1]       [,2]  
 [1,] "Q1Item1"  "373" 
 [2,] "Q1Item2"  "470" 
 [3,] "Q1Item3"  "740" 
 [4,] "Q1Item4"  "753" 
 [5,] "Q1Item5"  "543" 
 [6,] "Q1Item6"  "1021"
 [7,] "Q1Item7"  "174" 
 [8,] "Q1Item8"  "995" 
 [9,] "Q1Item9"  "696" 
[10,] "Q1Item10" "204"
  • Related