Home > Software design >  How to remove square brackets [] from R data frame without regex?
How to remove square brackets [] from R data frame without regex?

Time:06-22

I am importing a csv using R to create a dataframe from this csv. In the csv file, there are some characters I want to eliminate.

mydf$V1 <- str_replace_all(mydf$V1, "'", "")

I was able to get rid of single quotes in the above way, but I'm not able to remove square brackets. I don't want to use Pandas or regex, is there a way to do this using basic R, dplyr/tidyverse/stringr, or a similar library?

CodePudding user response:

Somewhat mysterious, but

gsub("[][]","", x)

will do it (i.e., remove [ and ]). The outer square brackets are interpreted as defining a set of characters; the inner square brackets are the set to remove. Putting the inner square brackets in the 'correct' order ([[]]) doesn't work presumably because the inner [] is interpreted as part of a special like [:alpha:]?

This regex does not work in stringr::str_remove() ("Missing closing bracket on a bracket expression"), but

stringr::str_remove_all(x,"[\\[\\]]")

does. (The double-backslashes are required because R uses single-backslash to identify special characters like newline (\n) etc..)

In principle it should be possible to use the relatively new "raw strings" feature in R to simplify the expression (i.e. use regex pattern r"[\[\]]" instead of "[\\[\\]]", but I seem to be discovering lots of weird interactions between that and the regex machinery (unless I'm doing something wrong/being sloppy, which is always a possibility ...)

If you really don't want to use regular expressions then

gsub(fixed = TRUE, "[", "",
   gsub(fixed = TRUE, "]", "", x)
)

or

x |> str_remove_all(fixed("[")) |> str_remove_all(fixed("]"))

should work.

  • Related