I have two data frames. One of the data frames contains a ID column, while the other does not. They do have the a column NumID
that can be used as a reference, and a date
column that can be used too. I would like to use the NumID
and the first date for each ID
in df
to append a ID
column into df2
.
library(lubridate)
library(tidyverse)
library(purrr)
date <- rep_len(seq(dmy("01-01-2011"), dmy("25-01-2011"), by = "days"), 25)
ID <- rep(c("A","B", "C"), 25)
NumID <- rep(c("01000", "02000", "03000"), 25)
df <- data.frame(date = date,
ID,
NumID)
date2 <- c("01-01-2011", "2011-01-02", "2011-01-03")
NumID2 <- c("1000", "2000", "3000")
df2 <- data.frame(date = date2,
NumID = NumID2)
My expected output would look something like this:
ID2 <- c("A","B", "C")
expected <- data.frame(date = date2,
NumID = NumID2,
ID = ID2)
CodePudding user response:
There are multiple date formats in date
column of 'df2'. An option is to convert to Date
class with parse_date
and then do a join
library(parsedate)
library(dplyr)
df2$date <- as.Date(parse_date(df2$date))
# or use `lubridate::parse_date_time` with formats
# df2$date <- as.Date(lubridate::parse_date_time(df2$date, c("dmy", "ymd")))
left_join(df2, df)
-output
date NumID ID
1 2011-01-01 00001 A
2 2011-01-02 00002 B
3 2011-01-03 00003 C
Or with a chain (%>%
)
df2 %>%
mutate(date = as.Date(parse_date(date)), NumID = sprintf('s', NumID)) %>%
left_join(df)
-output
date NumID ID
1 2011-01-01 01000 A
2 2011-01-02 02000 B
3 2011-01-03 03000 C