Home > Back-end >  How to replace whitespace of varying length with one space?
How to replace whitespace of varying length with one space?

Time:09-21

How could I replace whitespace of varying length with a single space in R?

R has plenty of methods for replacing characters, e.g. gsub, tidyr, or the stringr package. However, I cannot figure out how to best replace whitespace (with varying lengths) with a single \n space.

Here is a data.frame in R.

df = data.frame(column1= c("A B", "C  D", "E   F", "G    H", "I     J", "K      L", "M       N"))

Note that the number of spaces varies:

print(df)
#     column1
# 1       A B
# 2      C  D
# 3     E   F
# 4    G    H
# 5   I     J
# 6  K      L
# 7 M       N

My naïve approach would be to use gsub()

df$column1 <- gsub("[[:space:]]", " ", df$column1) 

However, this doesn't change the varying length of spaces for replacement with a single space.

CodePudding user response:

You could also use str_squish:

library(tidyverse)

df %>%
   mutate(column1 = str_squish(column1))

  column1
1     A B
2     C D
3     E F
4     G H
5     I J
6     K L
7     M N

CodePudding user response:

We may need to use i.e. one or more instead of just [[:space:]]

df$column1 <- gsub("[[:space:]] ", "\n", df$column1) 
df$column1
[1] "A\nB" "C\nD" "E\nF" "G\nH" "I\nJ" "K\nL" "M\nN"

If it is a single space, change the \n to " "

df$column1 <- gsub("[[:space:]] ", " ", df$column1) 
df$column1
[1] "A B" "C D" "E F" "G H" "I J" "K L" "M N"

The OP's code with [[:space:]] result in replacing the space with either space or \n

> gsub("[[:space:]]", "\n", df$column1)
[1] "A\nB"             "C\n\nD"           "E\n\n\nF"         "G\n\n\n\nH"       "I\n\n\n\n\nJ"     "K\n\n\n\n\n\nL"   "M\n\n\n\n\n\n\nN"
  • Related