Home > Mobile >  Remove comma inside quotes
Remove comma inside quotes

Time:11-18

I have strings like:

string <- "1, 2, \"something, else\""

I want to use tidyr::separate_rows() with sep==",", but the comma inside the quoted portion of the string is tripping me up. I'd like to remove the comma between something and else (but only this comma).

Here's a more complex toy example:

string <- c("1, 2, \"something, else\"", "3, 5, \"more, more, more\"", "6, \"commas, are fun\", \"no, they are not\"")

string
#[1] "1, 2, \"something, else\""                   
#[2] "3, 5, \"more, more, more\""                  
#[3] "6, \"commas, are fun\", \"no, they are not\""

I want to get rid of all commas inside the embedded quotations. Desired output:

[1] "1, 2, \"something else\""                  
[2] "3, 5, \"more more more\""                  
[3] "6, \"commas are fun\", \"no they are not\""

CodePudding user response:

You can define a small function to do the replacement.

library(stringr)

rmcom <- function(x) gsub(",", "", x)

str_replace(string, "(\".*, .*\")", rmcom)
[1] "1, 2, \"something else\""
[2] "3, 5, \"more more more\""
[3] "6, \"commas are fun\" \"no they are not\""

CodePudding user response:

Best I can do:

stringr::str_replace_all(string,"(?<=\\\".{1,15})(,)(?=. ?\\\")","")

it's: (?<= ) = look behind

\\\" = a \ and a "

.{1,15} = between 1 and 15 characters (see note)

(,) = the comma is what we want to target

(?= ) look ahead

. ? = one or more characters but as few as possible

\\\" = a \ and a "

note: look behind cannot be unbounded, so we can't use . ? here. Adjust the max of 15 for your dataset.

edit: Andre Wildberg's solution is better - I stupidly forgot that the "" defining the string are not part of the string, so made it much more complex than it needed to be.

CodePudding user response:

Altenatively, we could invert the problem (and keep the comma, which might be useful) and use a regex directly with separate_rows to split only at the comma NOT inside quotes:

library(tidyr)

df |>
  separate_rows(stringcol, sep = '(?!\\B"[^\"]*), (?![^"]*\"\\B)')

Regex expression from: Regex find comma not inside quotes

Alternatively: Regex to pick characters outside of pair of quotes

Output:

# A tibble: 9 × 1
  stringcol             
  <chr>                 
1 "1"                   
2 "2"                   
3 "\"something, else\"" 
4 "3"                   
5 "5"                   
6 "\"more, more, more\""
7 "6"                   
8 "\"commas, are fun\"" 
9 "\"no, they are not\""

Data:

library(tibble)

df <- tibble(stringcol = string)
  •  Tags:  
  • r
  • Related