I have a multipart lookup table problem in R. I have a data frame, where the number in each column represents an item name. The item name can be found in the corresponding look up table.
Data:
> food.dat
Fruit Vegetable Meat Dairy
1 1 2 2 3
2 3 2 1 1
3 3 2 2 2
4 2 2 1 1
5 1 1 1 2
Lookup Table:
> food.lookup
FoodItem Number FoodName
1 Fruit 1 Banana
2 Fruit 2 Apple
3 Fruit 3 Mango
4 Vegetable 1 Carrot
5 Vegetable 2 Broccoli
6 Meat 1 Chicken
7 Meat 2 Fish
8 Dairy 1 Cheese
9 Dairy 2 Yogurt
10 Dairy 3 IceCream
Note that the number is not unique amongst foods. For example, a 1 represents a different FoodName in column Fruit (Banana) and a different FoodName in column Vegetable (Carrot).
I would like to recode the food.dat dataframe to have the FoodName value from the lookup table. If possible I would also like to be able to use a simple function and supply a FoodName and return a dataframe from food.dat which includes only rows that include that specified FoodName.
Thank you for your time and thoughts :)
CodePudding user response:
split
the named vector
by the 'FoodItem' into a list
from the 'food.lookup'. Loop across
the 'food.dat' columns, extract the list
element and replace the values by matching
library(dplyr)
lst1 <- with(food.lookup, split(setNames(FoodName, Number), FoodItem))
food.dat %>%
mutate(across(all_of(names(lst1)), ~ lst1[[cur_column()]][as.character(.)]))
-output
Fruit Vegetable Meat Dairy
1 Banana Broccoli Fish IceCream
2 Mango Broccoli Chicken Cheese
3 Mango Broccoli Fish Yogurt
4 Apple Broccoli Chicken Cheese
5 Banana Carrot Chicken Yogurt
data
food.dat <- structure(list(Fruit = c(1L, 3L, 3L, 2L, 1L), Vegetable = c(2L,
2L, 2L, 2L, 1L), Meat = c(2L, 1L, 2L, 1L, 1L), Dairy = c(3L,
1L, 2L, 1L, 2L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5"))
food.lookup <- structure(list(FoodItem = c("Fruit", "Fruit",
"Fruit", "Vegetable",
"Vegetable", "Meat", "Meat", "Dairy", "Dairy", "Dairy"), Number = c(1L,
2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L), FoodName = c("Banana", "Apple",
"Mango", "Carrot", "Broccoli", "Chicken", "Fish", "Cheese", "Yogurt",
"IceCream")), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))
CodePudding user response:
On a similar note, you can make use of the "position" of your different names. For this split the looup table into the respective food-types (or type them out by hand). Then simply use the indexing to set the results.
Below it is done for one example. You can readily expand this to all. I store the results in Dairy2 so you can compare and see how the indexing works.
dairy <- c("Cheese","Yogurt","IceCream")
food.dat <- data.frame(Dairy = c(3,1,2,1,2))
food.dat$Dairy2 = dairy[food.dat$Dairy]
food.dat
Dairy Dairy2
1 3 IceCream
2 1 Cheese
3 2 Yogurt
4 1 Cheese
5 2 Yogurt
CodePudding user response:
We can pivot the data to a long format, with one food item by row, join to the lookup table, and pivot back to wide format
library(tidyr)
library(dplyr)
food.dat %>%
tibble::rowid_to_column() %>%
pivot_longer(-rowid, names_to = "FoodItem",
values_to = "Number") %>%
left_join(food.lookup) %>%
pivot_wider(id_cols = rowid, names_from = FoodItem,
values_from = FoodName)
#> # A tibble: 5 x 5
#> rowid Fruit Vegetable Meat Dairy
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 Banana Broccoli Fish IceCream
#> 2 2 Mango Broccoli Chicken Cheese
#> 3 3 Mango Broccoli Fish Yogurt
#> 4 4 Apple Broccoli Chicken Cheese
#> 5 5 Banana Carrot Chicken Yogurt
With data:
food.dat <- read.table(text =
'Fruit Vegetable Meat Dairy
1 2 2 3
3 2 1 1
3 2 2 2
2 2 1 1
1 1 1 2', header = TRUE)
food.lookup <- read.table(text =
'FoodItem Number FoodName
Fruit 1 Banana
Fruit 2 Apple
Fruit 3 Mango
Vegetable 1 Carrot
Vegetable 2 Broccoli
Meat 1 Chicken
Meat 2 Fish
Dairy 1 Cheese
Dairy 2 Yogurt
Dairy 3 IceCream', header = TRUE)