I have a dataset similar to "mydf" and have opted to go through the following process in order to achieve the desired outcome data.frame "desired_outcome". This is a simplistic example with the real dataset including roughly 30 variable "Letters", hence my interest in streamlining the below code example.
library(dplyr)
mydf <- data.frame("Location" = factor(c("A10", "A10", "A11", "A11a", "A12", "B10", "B11", "B12")))
A_locs <- factor(c("A10", "A11", "A11a", "A12"))
B_locs <- factor(c("B10", "B11", "B12"))
mylst <- list("A's" = A_locs, "B's" = B_locs)
mydf$Letter <- NA #initialize new field within my data.frame
mydf$Letter[mydf$Location %in% mylst[[1]]] <- names(mylst)[1]
mydf$Letter[mydf$Location %in% mylst[[2]]] <- names(mylst)[2]
mydf
desired_outcome <- data.frame("Location" = factor(c("A10", "A10", "A11", "A11a", "A12", "B10", "B11", "B12")),
"Letter" = factor(c("A's", "A's", "A's", "A's", "A's", "B's", "B's", "B's")))
I've had the idea of employing a FOR LOOP, although I would strongly prefer to find a more clever way to do this. The loop below does NOT produce the desired result, but is generally what I envisioned the loop itself looking like:
for (i in 1:length(names(mylst))){
if(mydf$Location[i] %in% mylst[[i]]) {
mydf$Letter <- names(mylst)[i]}
return(mydf)
}
I've looked into employing functions such as lapply or sapply, but I am unfamiliar with these. Are there any clever methods I may use here to:
1.) clean up the code I've written and
2.) avoid long blocks of manual iteration without a FOR loop?
CodePudding user response:
If you are open to a tidyverse
approach, you could try
library(tidyverse)
mydf %>%
mutate(Letter = deframe(map_dfr(mylst, tibble, .id = "name")[2:1])[Location])
This returns
Location Letter
1 A10 A's
2 A10 A's
3 A11 A's
4 A11a A's
5 A12 A's
6 B10 B's
7 B11 B's
8 B12 B's