I am wondering how I can create a new column and use the information of the corresponding row in a different dataset as input for that column.
Suppose I have these two datasets:
newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
c("Banana", "Banana", "Orange", "Pear", "Apple"),
c("Pear", "Orange", "Pear", "Apple", "Pear"),
c("Orange", "Mango", "Banana", "Banana", "Banana"))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")
newDf2 <- data.frame(c("Juice 6", "Juice 3", "Juice 2", "Juice 8","Juice 1"),
c(NA,NA,NA,NA,NA),
c(NA,NA,NA,NA,NA),
c(NA,NA,NA,NA,NA))
colnames(newDf2) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")
Resulting in:
> print(newDf)
Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 1 Banana Pear Orange
2 Juice 2 Banana Orange Mango
3 Juice 3 Orange Pear Banana
4 Juice 4 Pear Apple Banana
5 Juice 5 Apple Pear Banana
> print(newDf2)
Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6 NA NA NA
2 Juice 3 NA NA NA
3 Juice 2 NA NA NA
4 Juice 8 NA NA NA
5 Juice 1 NA NA NA
In my newDf2
I want to assign the fruits from newDf
. However, as you can see the Juices in the first column are not in the same order, so I have to assign based on the Juice.
I want to get to this outcome:
Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6 NA NA NA
2 Juice 3 Orange Pear Banana
3 Juice 2 Banana Orange Mango
4 Juice 8 NA NA NA
5 Juice 1 Banana Pear Orange
Can somebody help me with this?
Thanks!
EDIT:
In my actual dataset, the columns do not all start with Fruit btw... So they all have different names. Could you help me find a solution that works per total column name instead of the ones only starting with 'Fruit'?
CodePudding user response:
We may use match
within across
- loop across
the 'Fruit' columns in 'newDf2', match
the 'Juice' column values from the 'newDf', use that index to extract the values of the corresponding columns (cur_column()
- returns the column name of the looped column) from 'newDf`
library(dplyr)
newDf2 <- newDf2 %>%
mutate(across(starts_with('Fruit'),
~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))
-output
newDf2
Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6 <NA> <NA> <NA>
2 Juice 3 Orange Pear Banana
3 Juice 2 Banana Orange Mango
4 Juice 8 <NA> <NA> <NA>
5 Juice 1 Banana Pear Orange
If column names doesn't have any pattern, then use column index. Assuming that the first column in both datasets are Juice
, exclude that column in across
with -1
newDf2 <- newDf2 %>%
mutate(across(-1,
~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))