Home > front end >  Comparing unordered sets across multiple rows in r
Comparing unordered sets across multiple rows in r

Time:05-18

I would like to compare unordered sets of data across multiple rows, identifying the non-matching data from each row into new columns. For instance, my data is structured like

org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)

where each row represents a change to an organization's services, and "date_change" refers to the date that the change occurred. You will notice that when you look at the changes associated with the first organization, some of them just represent changes in the order of the listed services and not a change in the services (e.g. "dental care" for organization "1234"). I would like an output that identifies the actual removed values and the actual added values in new columns like the example below:

org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)

Any thoughts on how to approach this problem? Thanks!

CodePudding user response:

Perhaps something like this:

df %>% 
  pivot_longer(2:3) %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1 & trimws(value)!="") %>% 
  pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)

Output:

  org_id date_change original_value new_value     
  <chr>  <chr>       <chr>          <chr>         
1 1234   2018-01-31  food           emergency food

If you want in long format, and want to retain empty string, etc, you can do this:

df %>% 
  pivot_longer(2:3,names_to="action") %>% 
  group_by(org_id,date_change,value) %>% 
  filter(n()==1) %>% 
  mutate(action=if_else(action=="original_value", "removed", "added"))

Output:

  org_id date_change action  value           
  <chr>  <chr>       <chr>   <chr>           
1 1234   2018-01-31  removed "food"          
2 1234   2018-01-31  added   "emergency food"
3 1234   2018-01-31  removed " "             
  • Related