Home > other >  Appending a ID column to a data frame based on reference columns between two data frames
Appending a ID column to a data frame based on reference columns between two data frames

Time:06-06

I have two data frames. One of the data frames contains a ID column, while the other does not. They do have the a column NumID that can be used as a reference, and a date column that can be used too. I would like to use the NumID and the first date for each ID in df to append a ID column into df2.

library(lubridate)
library(tidyverse)
library(purrr)


date <- rep_len(seq(dmy("01-01-2011"), dmy("25-01-2011"), by = "days"), 25)
ID <- rep(c("A","B", "C"), 25)
NumID <- rep(c("01000", "02000", "03000"), 25)
df <- data.frame(date = date,
                 ID,
                 NumID)


date2 <- c("01-01-2011", "2011-01-02", "2011-01-03")
NumID2 <- c("1000", "2000", "3000")
df2 <- data.frame(date = date2,
                 NumID = NumID2)

My expected output would look something like this:

ID2 <- c("A","B", "C")
expected <- data.frame(date = date2,
                  NumID = NumID2,
                  ID = ID2)

CodePudding user response:

There are multiple date formats in date column of 'df2'. An option is to convert to Date class with parse_date and then do a join

library(parsedate)
library(dplyr)
df2$date <- as.Date(parse_date(df2$date))
# or use `lubridate::parse_date_time` with formats
# df2$date <- as.Date(lubridate::parse_date_time(df2$date, c("dmy", "ymd")))

left_join(df2, df)

-output

        date NumID ID
1 2011-01-01 00001  A
2 2011-01-02 00002  B
3 2011-01-03 00003  C

Or with a chain (%>%)

df2 %>%
   mutate(date = as.Date(parse_date(date)), NumID = sprintf('s', NumID)) %>%
   left_join(df)

-output

        date NumID ID
1 2011-01-01 01000  A
2 2011-01-02 02000  B
3 2011-01-03 03000  C
  • Related