Home > front end >  adding columns to dataframe based on the values in existing column in R
adding columns to dataframe based on the values in existing column in R

Time:04-23

I'm working in Rstudio and have a data frame similiar to the following:

Favorite<-c("Apple","Lemon","Orange","Salat","Onion", "Apple","Strawberry","Celery","Blueberry","Sweetpotatoes","Strawberry",
                "Oragne","Celery","Sweetpotatoes","Onion","Blueberry","Strawberry","Salad")
PersonID<-c(67,82,67,21,02,12,90,23,65,32,44,67,56,77,30,198,20,99)
all_Data<-data.frame(PersonID,Favorite)

> head(all_Data)
  PersonID Favorite
1       67    Apple
2       82    Lemon
3       67   Orange
4       21    Salat
5        2    Onion
6       12    Apple

I want to add 3 more column and they should contains the following:

If a row in all_Data$Favorite is Apple or Blueberry then all_Data$Country = Ireand, all_Data$Continent= Europe and all_Data$city=Belfast

If a row in all_Data$Favorite is Strawberry then all_Data$Country = Holland, all_Data$Continent= Europe and all_Data$city=Emmen

If a row in all_Data$Favorite is Lemon or Orange then all_Data$Country = France, all_Data$Continent= Europe and all_Data$city=Menton

If a row in all_Data$Favorite is Salad or Onion then all_Data$Country = Sweeden, all_Data$Continent= Europe and all_Data$city=Malmoe

If a row in all_Data$Favorite is Lemon or Orange then all_Data$Country = France, all_Data$Continent= Europe and all_Data$city=Menton

If a row in all_Data$Favorite is Sweetpotatoes then all_Data$Country = USA, all_Data$Continent= America and all_Data$city=Verona

If a row in all_Data$Favorite is Celery then all_Data$Country = Germany, all_Data$Continent= Europe and all_Data$city=Berlin

library(tidyverse)

all_Data |> 
  mutate(ctry_cont = case_when(
    str_detect(Favorite, "Appl|Blueb")  ~ "Ireland|Europe",
    str_detect(Favorite, "Straw")       ~ "Brazillian|South's of America",
    str_detect(Favorite, "Lemon|Orang") ~ "France|Europe",
    str_detect(Favorite, "Salad|Onion") ~ "Sweden|Europe",
    str_detect(Favorite, "Sweetpot")    ~ "United of state|America",
    str_detect(Favorite, "Celery")      ~ "Germany|Europe",
    TRUE                                ~ "Other|Other"
  )) |> 
  separate(ctry_cont, c("country", "continent"))

After running the code above I get the following warning and data where we see half of the value of United of Kingdom and United of America. I have also added words with apostrophe since in my original data there are word with apostrophes, but it is also not visible:

     PersonID      Favorite    country continent
1        67         Apple    Ireland    Europe
2        82         Lemon     France    Europe
3        67        Orange     France    Europe
4        21         Salat      Other     Other
5         2         Onion     Sweden    Europe
6        12         Apple    Ireland    Europe
7        90    Strawberry Brazillian     South
8        23        Celery    Germany    Europe
9        65     Blueberry    Ireland    Europe
10       32 Sweetpotatoes     United        of
11       44    Strawberry Brazillian     South
12       67        Oragne      Other     Other
13       56        Celery    Germany    Europe
14       77 Sweetpotatoes     United        of
15       30         Onion     Sweden    Europe
16      198     Blueberry    Ireland    Europe
17       20    Strawberry Brazillian     South
18       99         Salad     Sweden    Europe

    Warning message:
Expected 2 pieces. Additional pieces discarded in 5 rows [7, 10, 11, 14, 17].
    

I also tried to add sep=""at the last step of the code. it gives an error.

separate(ctry_cont, c("country", "continent"), sep="")

CodePudding user response:

You could do something like this ...

Favorite <- c(
  "Apple",
  "Lemon",
  "Orange",
  "Salad",
  "Onion",
  "Apple",
  "Strawberry",
  "Celery",
  "Blueberry",
  "Sweetpotatoes",
  "Strawberry",
  "Orange",
  "Celery",
  "Sweetpotatoes",
  "Onion",
  "Blueberry",
  "Strawberry",
  "Salad"
)

PersonID <-
  c(67, 82, 67, 21, 02, 12, 90, 23, 65, 32, 44, 67, 56, 77, 30, 198, 20, 99)

all_Data <- data.frame(PersonID, Favorite)

library(tidyverse)

all_Data |> 
  mutate(ctry_cont = case_when(
    str_detect(Favorite, "Appl|Blueb")  ~ "Ireland, Europe",
    str_detect(Favorite, "Straw")       ~ "Holland, Europe",
    str_detect(Favorite, "Lemon|Orang") ~ "France, Europe",
    str_detect(Favorite, "Salad|Onion") ~ "Sweden, Europe",
    str_detect(Favorite, "Sweetpot")    ~ "United States, North America",
    str_detect(Favorite, "Celery")      ~ "Germany, Europe",
    TRUE                                ~ "Other, Other"
  )) |> 
  separate(ctry_cont, c("country", "continent"), sep = ", ")
#>    PersonID      Favorite       country     continent
#> 1        67         Apple       Ireland        Europe
#> 2        82         Lemon        France        Europe
#> 3        67        Orange        France        Europe
#> 4        21         Salad        Sweden        Europe
#> 5         2         Onion        Sweden        Europe
#> 6        12         Apple       Ireland        Europe
#> 7        90    Strawberry       Holland        Europe
#> 8        23        Celery       Germany        Europe
#> 9        65     Blueberry       Ireland        Europe
#> 10       32 Sweetpotatoes United States North America
#> 11       44    Strawberry       Holland        Europe
#> 12       67        Orange        France        Europe
#> 13       56        Celery       Germany        Europe
#> 14       77 Sweetpotatoes United States North America
#> 15       30         Onion        Sweden        Europe
#> 16      198     Blueberry       Ireland        Europe
#> 17       20    Strawberry       Holland        Europe
#> 18       99         Salad        Sweden        Europe

Created on 2022-04-22 by the reprex package (v2.0.1)

  • Related