I have a character column that encodes a location. The location could be of the form "city, state, country" or "state, country", or "country". I'd like to split these up into three columns but to ensure that if there are less than three elements each of the values goes into the correct state or country column.
Here is my data and what I've tried:
tib <- tribble(~obs, ~location,
1, "Miami, Florida, United States",
2, "Astrakhan Oblast, Russia",
3, "Mozambique")
separate(tib, location, c("city", "state", "country"), ", ")
Result:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 Astrakhan Oblast Russia NA
3 3 Mozambique NA NA
In some sense, I'd like to run the separate
function in reverse order so that the result looks as follows:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
Update:
This is a viable option but I was hoping for something simpler:
tib %>% mutate(country = str_extract(location, "[A-Za-z ] $"),
state = str_extract(location, "(?<=\\,)[A-Za-z ] (?=\\,)"),
city = str_extract(location, "^[A-Za-z ] (?=\\,)"))
CodePudding user response:
For your specific example you can use the fill
argument in separate()
to change to filling in missing values from the left instead of the right.
tidyr::separate(tib, location, c("city", "state", "country"), ", ", fill = "left")
# A tibble: 3 x 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
CodePudding user response:
Here is an alternative way using separate_rows
: This was created with the help of dear @akrun Using complete to fill groups with NA to have same length as the maximum group. First try was to use complete
:
library(dplyr)
library(tidyr)
tib %>%
separate_rows("location", sep = ", ") %>%
group_by(obs) %>%
mutate(new = rev(c("country", "state", "city")[row_number()])) %>%
ungroup %>%
pivot_wider(names_from = new, values_from = location)
output:
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique