Home > Net >  Extracting columns from strings in R
Extracting columns from strings in R

Time:02-21

I have somewhat of a complex case that I am not able to parse out for myself (see structure and the dput below). What the df looks like:

String for each user

Each participant has a string associated with it. I would like to create columns for each participants from the strings. I see that this can be accomplished with extract() or separate() of the tidyverse package. However, the catch is that I first want to create additional rows for each ID based on ; separator.

So an intermediate step would look something like each IDs would have multiple rows of data. E.g., ID=1 has 2 semi-colons so it should have two rows of data. Basically, every substring after ; belongs into a new row.

After that, I want to separate values by , into columns.

So far, I'm trying to accomplish the first step.

mandatory_puzzles %>% mutate(Mandatory = str_split(Mandatory, pattern = ";"))

Which gives me a character list for each ID.

Character list for each ID

When I pull Mandatory column, I see that I am able to achieve an intermediary step.

Intermediate Step

How do I proceed from here?

Here's the data:

structure(list(Mandatory = c("Manipulate environment objects , 0 , 285.54 , 270.81 , 51 , 7.18 , 43.82 , 6 , 3 , 0 , 1 , 59 , 25.67449;Manipulate environment objects , 1 , 542.94 , 255 , 320.02 , 28.26 , 291.76 , 18 , 2 , 2 , 5 , 55 , 24.58493;Manipulate environment objects , 2 , 856.8 , 311.39 , 124.58 , 21.45 , 103.13 , 15 , 3 , 0 , 2 , 36 , 26.28657;Manipulate environment objects , 3 , 1129.1 , 267.08 , 58.62 , 12.65 , 45.96 , 10 , 2 , 0 , 4 , 48 , 25.64844;Manipulate environment objects , 1 , 1226.98 , 89.83 , 65.93 , 18.71 , 47.21 , 13 , 1 , 0 , 5 , 16 , 23.01501", 
"Manipulate environment objects , 0 , 282.28 , 254.48 , 96.46 , 7.23 , 89.23 , 7 , 2 , 1 , 1 , 47 , 60.15876;Manipulate environment objects , 1 , 457.42 , 170.06 , 66.17 , 7.99 , 58.18 , 10 , 2 , 0 , 4 , 39 , 59.98176;Manipulate environment objects , 2 , 947.78 , 484.34 , 640.47 , 29.06 , 611.41 , 29 , 3 , 3 , 1 , 25 , 60.04829;Manipulate environment objects , 3 , 1073.8 , 122.48 , 83.37 , 7.91 , 75.46 , 9 , 2 , 0 , 3 , 23 , 59.94974", 
"Manipulate environment objects , 0 , 161.66 , 147.03 , 30.98 , 5.56 , 25.42 , 6 , 2 , 0 , 1 , 60 , 59.344;Manipulate environment objects , 1 , 293.38 , 126.06 , 42.92 , 11.66 , 31.26 , 9 , 2 , 0 , 4 , 52 , 59.47927;Manipulate environment objects , 2 , 548.26 , 252.67 , 221.09 , 36.73 , 184.36 , 33 , 3 , 1 , 1 , 46 , 59.70362;Manipulate environment objects , 3 , 682.66 , 132.26 , 43.8 , 7.34 , 36.45 , 8 , 2 , 0 , 3 , 52 , 62.20168;Manipulate environment objects , 1 , 818.18 , 123.68 , 14.31 , 3.15 , 11.15 , 5 , 1 , 0 , 4 , 52 , 59.62266;Manipulate environment objects , 2 , 976.86 , 155.28 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 25 , 61.21601;Manipulate environment objects , 2 , 1008.76 , 29.8 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 13 , 59.28677;Manipulate environment objects , 2 , 1019.34 , 8.43 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 2 , 59.81207", 
"Manipulate environment objects , 1 , 59.06 , 48.72 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 27 , 75.03592;Manipulate environment objects , 1 , 93.14 , 27.5 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 6 , 74.96835;Manipulate environment objects , 1 , 223.1 , 125.77 , 60.6 , 6.03 , 54.57 , 8 , 2 , 0 , 2 , 37 , 74.98416;Manipulate environment objects , 2 , 485.24 , 258 , 141.3 , 17.31 , 124 , 20 , 3 , 0 , 5 , 34 , 74.94012;Manipulate environment objects , 3 , 644.2 , 156.72 , 74.81 , 12.27 , 62.54 , 11 , 2 , 0 , 1 , 36 , 74.96394", 
"Manipulate environment objects , 1 , 2002.74 , 108.46 , 24.89 , 3.86 , 21.03 , 9 , 2 , 0 , 2 , 55 , 59.97829;Manipulate environment objects , 2 , 2249.74 , 244.39 , 94.38 , 10.67 , 83.71 , 21 , 3 , 1 , 5 , 43 , 60.01971;Manipulate environment objects , 3 , 2417.84 , 165.16 , 29.56 , 6.43 , 23.13 , 12 , 2 , 0 , 1 , 53 , 59.98882;Manipulate environment objects , 3 , 2481.99 , 61.77 , 9.57 , 2.28 , 7.29 , 5 , 1 , 0 , 2 , 16 , 60.0066;Manipulate environment objects , 1 , 2614.9 , 123.53 , 25.84 , 4.89 , 20.95 , 10 , 2 , 0 , 4 , 53 , 59.98421;Manipulate environment objects , 2 , 2818.64 , 201.51 , 76.99 , 12.27 , 64.72 , 26 , 4 , 1 , 2 , 44 , 60.03096;Manipulate environment objects , 3 , 2913.61 , 92.3 , 28.49 , 4.66 , 23.83 , 11 , 2 , 0 , 4 , 16 , 60.08723;Manipulate environment objects , 1 , 3039.14 , 116.06 , 50.32 , 11.83 , 38.48 , 22 , 2 , 0 , 6 , 53 , 59.97778;Manipulate environment objects , 2 , 3057.44 , 16.34 , 8.94 , 1.72 , 7.22 , 4 , 1 , 0 , 1 , 5 , 59.97208;Manipulate environment objects , 2 , 3217.52 , 153.71 , 37.96 , 4.72 , 33.25 , 14 , 3 , 1 , 4 , 43 , 60.01913;Manipulate environment objects , 3 , 3359.48 , 140.2 , 51.85 , 8.67 , 43.18 , 18 , 2 , 1 , 6 , 50 , 60.06129;Manipulate environment objects , 1 , 3480.78 , 114.93 , 23.72 , 4.31 , 19.41 , 13 , 2 , 0 , 2 , 52 , 59.95073;Manipulate environment objects , 2 , 3638.04 , 154.73 , 38.13 , 7.31 , 30.82 , 19 , 4 , 0 , 6 , 44 , 60.00352;Manipulate environment objects , 3 , 3764.88 , 124.92 , 25.51 , 6.16 , 19.35 , 12 , 2 , 0 , 2 , 51 , 60.01723;Manipulate environment objects , 1 , 3883.16 , 112.63 , 20.66 , 4.06 , 16.6 , 9 , 2 , 0 , 4 , 54 , 59.95847;Manipulate environment objects , 2 , 4025.9 , 140.86 , 68.49 , 13.15 , 55.33 , 24 , 3 , 1 , 1 , 43 , 59.98686;Manipulate environment ..."
), ID = 1:5), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))


I appreciate any help you can provide here.

CodePudding user response:

We may do this with separate_rows to split into rows based on the ; and then split at the , into columns with separate

library(dplyr)
library(tidyr)
library(stringr)
out <- mandatory_puzzles %>%
    separate_rows( Mandatory, sep = "\\s*;\\s*")%>% 
    separate(Mandatory, into = str_c("col", 
     seq_len(str_count(first(.$Mandatory), ",")  1)), 
         sep = "\\s*,\\s*", convert = TRUE, extra = "merge")
  • Related