Home > Software engineering >  New Column Based on Conditions
New Column Based on Conditions

Time:11-26

To set the scene, I have a set of data where two columns of the data have been mixed up. To give a simple example:

df1 <- data.frame(Name = c("Bob", "John", "Mark", "Will"), City=c("Apple", "Paris", "Orange", "Berlin"), Fruit=c("London", "Pear", "Madrid", "Orange"))
df2 <- data.frame(Cities = c("Paris", "London", "Berlin", "Madrid", "Moscow", "Warsaw"))

As a result, we have two small data sets:

> df1
  Name   City  Fruit
1  Bob  Apple London
2 John  Paris   Pear
3 Mark Orange Madrid
4 Will Berlin Orange

> df2
  Cities
1  Paris
2 London
3 Berlin
4 Madrid
5 Moscow
6 Warsaw

My aim is to create a new column where the cities are in the correct place using df2. I am a bit new to R so I don't know how this would work.

I don't really know where to even start with this sort of a problem. My full dataset is much larger and it would be good to have an efficient method of unpicking this issue!

CodePudding user response:

If the 'City' values are only different. We may loop over the rows, create a logical vector based on the matching values with 'Cities' from 'df2', and concatenate with the rest of the values by getting the matched values second in the order

df1[] <- t(apply(df1, 1, function(x) 
          {
         i1 <- x %in% df2$Cities
          i2 <- !i1
          x1 <- x[i2]
        c(x1[1], x[i1], x1[2])}))

-output

> df1
  Name   City  Fruit
1  Bob London  Apple
2 John  Paris   Pear
3 Mark Madrid Orange
4 Will Berlin Orange

CodePudding user response:

using dplyr package this is a solution, where it looks up the two City and Fruit values in df1, and takes the one that exists in the df2 cities list. if none of the two are a city name, an empty string is returned, you can replace that with anything you prefer.

library(dplyr)
df1$corrected_City <- case_when(df1$City %in% df2$Cities ~ df1$City,
                                df1$Fruit%in% df2$Cities ~ df1$Fruit,
                                TRUE ~ "")

output, a new column created as you wanted with the city name on that row.

> df1
  Name   City  Fruit corrected_City
1  Bob  Apple London         London
2 John  Paris   Pear          Paris
3 Mark Orange Madrid         Madrid
4 Will Berlin Orange         Berlin

CodePudding user response:

Another way is:

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(across(1:3, ~case_when(. %in%  df2$Cities  ~ .), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')
 Name   City  Fruit New_Col
1  Bob  Apple London  London
2 John  Paris   Pear   Paris
3 Mark Orange Madrid  Madrid
4 Will Berlin Orange  Berlin
  •  Tags:  
  • r
  • Related