Suppose I have a data set with sales:
library(tidyverse)
initial_data <- tribble(
~ billing_loc, ~ type, ~ billing_comment,
"New York", "RE", "aaaa ssss 003tt",
"London", "ZO", "BO",
"Paris", "ZO", "003sd, 003Wf; 003ghdiscount"
)
First of all, I want to extract all the project ids which are provided in the billing_comment column. A project id always starts with "003" and has a length of 5. I did it with the following code:
modified_data <- initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."))
Secondly, I want to use left_join to look up for the information about the ids from the second table and insert it in the list-column:
project_id_database <- tribble(
~ project_id, ~ type, ~ owner,
"003tt", "ZO", "Juan",
"003sd", "ZO", "Mike",
"003aA", "RE", "Brent",
"003Wf", "ZO", "Brent",
"003gh", "RE", "Anna",
"003qQ", "ZO", "Donald"
)
Is there a way to use left_join for nested data without unnesting it and get a list-column containing tibbles with all information about those ids (don't be confused with <lgl [1] in the second row, it can either return NA or some string saying that the id is missing)?
billing_loc type billing_comment project_data
<chr> <chr> <chr> <list>
1 New York RE aaaa ssss 003tt <tibble [1 x 3]>
2 London ZO BO <lgl [1]>
3 Paris ZO 003sd, 003Wf; 003ghdiscount <tibble [3 x 3]>
I found a way to do it with unnest()
and then use left_join
but I think there should be a more effective solution.
Finally, it would be nice if there were a solution to add a column with only one id that matches a criteria in column "type" (it several ids matches the criteria it should return the first). For that purpose I used map()
but I also think this way is not as effective as it could be because I used the column "status" for that:
test_data_nest %>%
mutate(final_project = map(data, ~ filter(., status == "Match"))) %>%
mutate(final_project = map(final_opp, 1))
CodePudding user response:
I think separating the rows, extracting the project_id
s, attaching the details, nesting, and then joining back would be simpler and faster than trying to map
:
initial_data %>%
separate_rows(billing_comment) %>%
mutate(project_id = str_extract(billing_comment, "003..")) %>%
inner_join(project_id_database %>% select(-type), by="project_id") %>%
group_by(billing_loc, type) %>%
nest() %>%
right_join(initial_data, by=c("billing_loc", "type"))
## A tibble: 3 x 4
## Groups: billing_loc, type [3]
# billing_loc type data billing_comment
# <chr> <chr> <list> <chr>
#1 New York RE <tibble [1 x 3]> aaaa ssss 003tt
#2 Paris ZO <tibble [3 x 3]> 003sd, 003Wf; 003ghdiscount
#3 London ZO <NULL> BO
CodePudding user response:
If you want to do this without unnesting the data in long format you can use -
library(tidyverse)
initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."),
data = map(project_id,
~project_id_database[match(.x, project_id_database$project_id), ]))
# A tibble: 3 x 5
# billing_loc type billing_comment project_id data
# <chr> <chr> <chr> <list> <list>
#1 New York RE aaaa ssss 003tt <chr [1]> <tibble [1 × 3]>
#2 London ZO BO <chr [0]> <tibble [0 × 3]>
#3 Paris ZO 003sd, 003Wf; 003ghdiscount <chr [3]> <tibble [3 × 3]>