I have a fairly straight forward question & i'm hoping there's a very simple answer that I just haven't stumbled upon yet.
I'm attempting to use tidyr::separate()
to create two columns within a data.frame from a single character string column (using a comma as a delimiter). The issue is that the data has multiple commas; however, there are quotes around the left-most column. Is there a way to separate this value into two columns while respecting the contents within the quotes?
#trying to re-create the issue
band_members <- data.frame(col = paste0('"Paul,George,John,Ringo','"',',','Beatles'))
print(band_members)
----------------------------------
col
----------------------------------
"Paul,George,John,Ringo",Beatles
----------------------------------
#trying to separate
new_dat <- band_members %>% tidyr::separate(col = col,into = c('members','band'),sep = ',')
print(new_dat)
------------------
members band
--------- --------
"Paul George
------------------
^ This is not ideal. What I'd like (below):
------------------------------------
members band
-------------------------- ---------
"Paul,George,John,Ringo" Beatles
------------------------------------
Any help would be greatly appreciated!
CodePudding user response:
If format is always like "members",band
, using sep = '",'
instead of ","
may helps.
band_members %>%
tidyr::separate(col = col,into = c('members','band'),sep = '",') %>%
mutate(members = paste0(members, "\""))
members band
1 "Paul,George,John,Ringo" Beatles
CodePudding user response:
You can use tidyr::extract()
rather than separate
, and then it's just a case of finding the right regex:
band_members %>%
extract(col, c("members", "band"), "^\"(.*?)\",(.*?)$")
Result:
members band
1 Paul,George,John,Ringo Beatles