Home > Mobile >  R Extract specific text from column into multiple columns
R Extract specific text from column into multiple columns

Time:04-27

I have a dataframe exported from the web with this format

id vals
1  {7,12,58,1}
2  {1,2,5,7}  
3  {15,12} 

I would like to extract ONLY the numbers (ignore curlys and commas) into multiple columns like this

id val_1 val_2 val_3 val_4 val_5
1  7     12    58    1
2  1     2     5     7  
3  15    12  

Even though the Max of values we got was 4 I want to always go up to value val_5.

Thanks!

CodePudding user response:

We could use str_extract_all for this:

library(dplyr)
library(stringr)

df %>% 
  mutate(vals = str_extract_all(vals, '\\d ', ''))

or as @akrun suggest in the comments

df %>% 
  mutate(vals = str_extract_all(vals, '\\d ', '')) %>% 
  do.call(data.frame, .)
  id vals.1 vals.2 vals.3 vals.4
1  1      7     12     58      1
2  2      1      2      5      7
3  3     15     12   <NA>   <NA>

data:

df <- structure(list(id = 1:3, vals = c("{7,12,58,1}", "{1,2,5,7}", 
"{15,12}")), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response:

Another possible tidyverse option, where we remove the curly brackets, then separate the rows on the ,, then pivot to wide form. Then, we can create the additional column (using add_column from tibble) based on the max value in the column names (which is 4 in this case), and then can create val_5.

library(tidyverse)

df %>%
  mutate(vals = str_replace_all(vals, "\\{|\\}", "")) %>%
  separate_rows(vals, sep=",") %>%
  group_by(id) %>%
  mutate(ind = row_number()) %>%
  pivot_wider(names_from = ind, values_from = vals, names_prefix = "val_") %>%
  add_column(!!(paste0("val_", parse_number(names(.)[ncol(.)]) 1)) := NA)

Output

  id val_1 val_2 val_3 val_4 val_5
1  1     7    12    58     1    NA
2  2     1     2     5     7    NA
3  3    15    12  <NA>  <NA>    NA

Data

df <- read.table(text = "id vals
1  {7,12,58,1}
2  {1,2,5,7}
3  {15,12} ", header = T)
  • Related