How could I replace whitespace of varying length with a single space in R?
R has plenty of methods for replacing characters, e.g. gsub
, tidyr, or the stringr package. However, I cannot figure out how to best replace whitespace (with varying lengths) with a single \n
space.
Here is a data.frame in R.
df = data.frame(column1= c("A B", "C D", "E F", "G H", "I J", "K L", "M N"))
Note that the number of spaces varies:
print(df)
# column1
# 1 A B
# 2 C D
# 3 E F
# 4 G H
# 5 I J
# 6 K L
# 7 M N
My naïve approach would be to use gsub()
df$column1 <- gsub("[[:space:]]", " ", df$column1)
However, this doesn't change the varying length of spaces for replacement with a single space.
CodePudding user response:
You could also use str_squish
:
library(tidyverse)
df %>%
mutate(column1 = str_squish(column1))
column1
1 A B
2 C D
3 E F
4 G H
5 I J
6 K L
7 M N
CodePudding user response:
We may need to use
i.e. one or more instead of just [[:space:]]
df$column1 <- gsub("[[:space:]] ", "\n", df$column1)
df$column1
[1] "A\nB" "C\nD" "E\nF" "G\nH" "I\nJ" "K\nL" "M\nN"
If it is a single space, change the \n
to " "
df$column1 <- gsub("[[:space:]] ", " ", df$column1)
df$column1
[1] "A B" "C D" "E F" "G H" "I J" "K L" "M N"
The OP's code with [[:space:]]
result in replacing the space with either space or \n
> gsub("[[:space:]]", "\n", df$column1)
[1] "A\nB" "C\n\nD" "E\n\n\nF" "G\n\n\n\nH" "I\n\n\n\n\nJ" "K\n\n\n\n\n\nL" "M\n\n\n\n\n\n\nN"