Home > Back-end >  New column based on rows of other dataframe
New column based on rows of other dataframe

Time:05-12

I am wondering how I can create a new column and use the information of the corresponding row in a different dataset as input for that column.

Suppose I have these two datasets:

newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
                    c("Banana", "Banana", "Orange", "Pear", "Apple"),
                    c("Pear", "Orange", "Pear", "Apple", "Pear"),
                    c("Orange", "Mango", "Banana", "Banana", "Banana"))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

newDf2 <- data.frame(c("Juice 6", "Juice 3", "Juice 2", "Juice 8","Juice 1"),
                     c(NA,NA,NA,NA,NA),
                     c(NA,NA,NA,NA,NA),
                     c(NA,NA,NA,NA,NA))
colnames(newDf2) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

Resulting in:

> print(newDf)
    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 1  Banana    Pear  Orange
2 Juice 2  Banana  Orange   Mango
3 Juice 3  Orange    Pear  Banana
4 Juice 4    Pear   Apple  Banana
5 Juice 5   Apple    Pear  Banana

> print(newDf2)
    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6      NA      NA      NA
2 Juice 3      NA      NA      NA
3 Juice 2      NA      NA      NA
4 Juice 8      NA      NA      NA
5 Juice 1      NA      NA      NA

In my newDf2 I want to assign the fruits from newDf. However, as you can see the Juices in the first column are not in the same order, so I have to assign based on the Juice.

I want to get to this outcome:

    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6      NA      NA      NA
2 Juice 3  Orange    Pear  Banana
3 Juice 2  Banana  Orange   Mango
4 Juice 8      NA      NA      NA
5 Juice 1  Banana    Pear  Orange

Can somebody help me with this?

Thanks!

EDIT:

In my actual dataset, the columns do not all start with Fruit btw... So they all have different names. Could you help me find a solution that works per total column name instead of the ones only starting with 'Fruit'?

CodePudding user response:

We may use match within across - loop across the 'Fruit' columns in 'newDf2', match the 'Juice' column values from the 'newDf', use that index to extract the values of the corresponding columns (cur_column() - returns the column name of the looped column) from 'newDf`

library(dplyr)
newDf2 <- newDf2 %>% 
  mutate(across(starts_with('Fruit'), 
       ~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))

-output

newDf2
  Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6    <NA>    <NA>    <NA>
2 Juice 3  Orange    Pear  Banana
3 Juice 2  Banana  Orange   Mango
4 Juice 8    <NA>    <NA>    <NA>
5 Juice 1  Banana    Pear  Orange

If column names doesn't have any pattern, then use column index. Assuming that the first column in both datasets are Juice, exclude that column in across with -1

newDf2 <- newDf2 %>% 
  mutate(across(-1, 
       ~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))
  • Related