I have a dataset containing information on a range of cities, but there is no column which says what country the city is located in. In order to perform the analysis, I need to add an extra column which has the name of the country.
population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin
I expect the output to look like this:
population city country
500,000 Oslo Norway
750,000 Bristol England
500,000 Liverpool England
1,000,000 Dublin Ireland
How can I add a column of country names based on the city and population to a large dataset in R?
CodePudding user response:
I am adapting Tom Hoel's answer, as suggested by Ian Campbell. If this is selected I am happy to mark it as community wiki.
library(maps)
library(dplyr)
data("world.cities")
df <- readr::read_table("population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin")
df |>
inner_join(
select(world.cities, name, country.etc, pop),
by = c("city" = "name")
) |> group_by(city) |>
filter(
abs(pop - population) == min(abs(pop - population))
)
# A tibble: 4 x 4
# Groups: city [4]
# population city country.etc pop
# <dbl> <chr> <chr> <int>
# 1 500000 Oslo Norway 821445
# 2 750000 Bristol UK 432967
# 3 500000 Liverpool UK 468584
# 4 1000000 Dublin Ireland 1030431
CodePudding user response:
As stated by others, the cities exists in other countries too as well.
library(tidyverse)
library(maps)
data("world.cities")
df <- read_table("population city
500,000 Oslo
750,000 Bristol
500,000 Liverpool
1,000,000 Dublin")
df %>%
merge(., world.cities %>%
select(name, country.etc),
by.x = "city",
by.y = "name")
# A tibble: 7 × 3
city population country.etc
<chr> <dbl> <chr>
1 Bristol 750000 UK
2 Bristol 750000 USA
3 Dublin 1000000 USA
4 Dublin 1000000 Ireland
5 Liverpool 500000 UK
6 Liverpool 500000 Canada
7 Oslo 500000 Norway
CodePudding user response:
I think your best bet would be to add a new column in your dataset called country and fill it out, this is part of the CRSIP-DM process data preparation so this is not uncommon. If that does not answer your question please let me know and i will do my best to help.