I have this code that splits the column on the second space, but I don't know how to modify it to split on the first space only. I'm not that familiar with regex.
library(tidyr)
df <- data.frame(Location = c("San Jose CA", "Fremont CA", "Santa Clara CA"))
separate(df, Location, into = c("city", "state"), sep = " (?=[^ ] $)")
# city state
# 1 San Jose CA
# 2 Fremont CA
# 3 Santa Clara CA
CodePudding user response:
If you want to stick with separate
, then try:
separate(df, Location, into=c("city", "state"), sep=" (?=[A-Z]{2}$)")
We can also try using sub
here for a base R option:
df$city <- sub("\\s [A-Z]{2}$", "", df$Location)
df$state <- sub("^.*\\s ", "", df$Location)
CodePudding user response:
You can use
library(tidyr)
df <- data.frame(Location = c("San Jose CA", "Fremont CA", "Santa Clara CA"))
df_new <- separate(df, Location, into = c("city", "state"), sep = "^\\S*\\K\\s ")
Output:
> df_new
city state
1 San Jose CA
2 Fremont CA
3 Santa Clara CA
The ^\S*\K\s
regex matches
^
- start of string\S*
- zero or more non-whitespace chars\K
- match reset operator that discards the text matched so far from the overall match memory buffer\s
- one or more whitespace chars.
NOTE: If your strings can have leading whitespace, and you want to ignore this leading whitespace, you can add \\s*
right after ^
and use
sep = "^\\s*\\S \\K\\s "
Here, \S
will require at least one (or more) non-whitespace chars to exist before the whitespaces that the string is split with.