I'm trying to clean survey data in R and want to isolate numeric values from a longer string:
373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10
I'd like to remove all values beginning with Q as well as the vertical bars while retaining the numbers such that my output would be:
373, 470, 740, 753 ...
Any advice on how best to do this is greatly appreciated!
CodePudding user response:
Try this
gsub("\\|" , ", ", gsub("\\|Q\\d Item\\d \n*" , "" , x))
- output
[1] "373, 470, 740, 753, 543, 1021, 174, 995, 696, 204"
- data
x <- "373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10"
CodePudding user response:
I was looking at it a little differently. data
x <- "373|Q1Item1|Q2Item1|Q3Item1|Q4Item1|Q5Item1|Q6Item1|470|Q1Item2|Q2Item2|Q3Item2|Q4Item2|Q5Item2|Q6Item2|740|Q1Item3|Q2Item3|Q3Item3|Q4Item3|Q5Item3|Q6Item3|753|Q1Item4|Q2Item4|Q3Item4|Q4Item4|Q5Item4|Q6Item4|543|Q1Item5|Q2Item5|Q3Item5|Q4Item5|Q5Item5|Q6Item5|1021|Q1Item6|Q2Item6|Q3Item6|Q4Item6|Q5Item6|Q6Item6|174|Q1Item7|Q2Item7|Q3Item7|Q4Item7|Q5Item7|Q6Item7|995|Q1Item8|Q2Item8|Q3Item8|Q4Item8|Q5Item8|Q6Item8|696|Q1Item9|Q2Item9|Q3Item9|Q4Item9|Q5Item9|Q6Item9|204|Q1Item10|Q2Item10|Q3Item10|Q4Item10|Q5Item10|Q6Item10"
Where a key:value pair seems developing
apply(matrix(rev(unlist(strsplit(stringr::str_extract_all(x, ('\\d{3,4}\\|Q\\d Item\\d '))[[1]], '|', fixed = TRUE))), ncol=2, byrow = TRUE), 2, rev)
[,1] [,2]
[1,] "Q1Item1" "373"
[2,] "Q1Item2" "470"
[3,] "Q1Item3" "740"
[4,] "Q1Item4" "753"
[5,] "Q1Item5" "543"
[6,] "Q1Item6" "1021"
[7,] "Q1Item7" "174"
[8,] "Q1Item8" "995"
[9,] "Q1Item9" "696"
[10,] "Q1Item10" "204"