I am importing a csv using R to create a dataframe from this csv. In the csv file, there are some characters I want to eliminate.
mydf$V1 <- str_replace_all(mydf$V1, "'", "")
I was able to get rid of single quotes in the above way, but I'm not able to remove square brackets. I don't want to use Pandas or regex, is there a way to do this using basic R, dplyr/tidyverse/stringr, or a similar library?
CodePudding user response:
Somewhat mysterious, but
gsub("[][]","", x)
will do it (i.e., remove [
and ]
). The outer square brackets are interpreted as defining a set of characters; the inner square brackets are the set to remove. Putting the inner square brackets in the 'correct' order ([[]]
) doesn't work presumably because the inner []
is interpreted as part of a special like [:alpha:]
?
This regex does not work in stringr::str_remove()
("Missing closing bracket on a bracket expression"), but
stringr::str_remove_all(x,"[\\[\\]]")
does. (The double-backslashes are required because R uses single-backslash to identify special characters like newline (\n
) etc..)
In principle it should be possible to use the relatively new "raw strings" feature in R to simplify the expression (i.e. use regex pattern r"[\[\]]"
instead of "[\\[\\]]"
, but I seem to be discovering lots of weird interactions between that and the regex machinery (unless I'm doing something wrong/being sloppy, which is always a possibility ...)
If you really don't want to use regular expressions then
gsub(fixed = TRUE, "[", "",
gsub(fixed = TRUE, "]", "", x)
)
or
x |> str_remove_all(fixed("[")) |> str_remove_all(fixed("]"))
should work.