Home > other >  Identifying values from one database to use in another database
Identifying values from one database to use in another database

Time:10-30

I am working on a project in which I need to work with 2 databases, identify values from one database to use in another.

I have a dataframe 1,

df1<-data.frame("ID"=c(1,2,3),"Condition A"=c("B","B","A"),"Condition B"=c("1","1","2"),"Year"=c(2002,1988,1995))

and a dataframe 2,

df2 <- data.frame("Condition A"=c("A","A","B","B"),"Condiction B"=c("1","2","1","2"),"<1990"=c(20,30,50,80),"1990-2000"=c(100,90,80,30),">2000"=c(300,200,800,400))

I would like to add a new column to df1 called "Value", in which, for each ID (from df1), collects the values from column 3,4 or 5 from df2 (depending on the year), and following conditions A and B available in both databases. The end result would be something like this:

df1<-data.frame("ID"=c(1,2,3),"Condition A"=c("B","B","A"),"Condition B"=c("1","1","2"),"Year"=c(2002,1988,1995),"Value"=c(800,50,90))

thanks!

CodePudding user response:

I think we can simply left_join, then mutate with case_when, then drop the undesired columns with select:

library(dplyr)

left_join(df1, df2, by=c("Condition.A", "Condition.B"))%>%
  mutate(Value=case_when(Year<1990 ~ X.1990,
                         Year<2000 ~ X1990.2000,
                         Year>=2000 ~ X.2000))%>%
  select(-starts_with("X"))

  ID Condition.A Condition.B Year Value
1  1           B           1 2002   800
2  2           B           1 1988    50
3  3           A           2 1995    90

EDIT: I edited your code, removing the "Condiction" typo

CodePudding user response:

You could use

library(dplyr)
library(tidyr)
df2 %>% 
  rename(Condition.B = Condiction.B) %>% 
  pivot_longer(matches("\\d {4}")) %>%
  right_join(df1, by = c("Condition.A", "Condition.B")) %>% 
  filter(name == case_when(
    Year < 1990 ~ "X.1990",
    Year > 2000 ~ "X.2000",
    TRUE        ~ "X1990.2000")) %>% 
  select(ID, Condition.A, Condition.B, Year, Value = value) %>% 
  arrange(ID)

This returns

# A tibble: 3 x 5
     ID Condition.A Condition.B  Year Value
  <dbl> <chr>       <chr>       <dbl> <dbl>
1     1 B           1            2002   800
2     2 B           1            1988    50
3     3 A           2            1995    90
  • At first we rename the misspelled column Condiction.B of df2 and bring it into a "long format" based on the "<1990", "1990-2000", ">2000" columns. Note that those columns can't be named like this, they are automatically renamed to X.1990, X1990.2000 and X.2000.
  • Next we use a right join with df1 on the two Condition columns.
  • Finally we filter just the matching years based on a hard coded case_when function and do some clean up (selecting and arranging).

CodePudding user response:

We could do it this way:

  1. Condiction must be a typo so I changed it to Condition
  2. in df1 create a helper column that assigns each your to the group which is a column name in df2
  3. bring df2 in long format
  4. finally apply left_join by by=c("Condition.A", "Condition.B", "helper"="name")
library(dplyr)
library(tidyr)
df1 <- df1 %>% 
  mutate(helper = case_when(Year >=1990 & Year <=2000 ~"X1990.2000",
                            Year <1990 ~ "X.1990",
                            Year >2000 ~ "X.2000"))

df2 <- df2 %>% 
  pivot_longer(
    cols=starts_with("X")
  )

df3 <- left_join(df1, df2, by=c("Condition.A", "Condition.B", "helper"="name")) %>% 
  select(-helper)
  ID Condition.A Condition.B Year value
1  1           B           1 2002   800
2  2           B           1 1988    50
3  3           A           2 1995    90
  • Related