I am looking to make a new variable to mark which of my data is duplicated, selecting the oldest datapoint to be the "original". My dataframe is ordered by date, but by ID.
ID Name Number Datetime (dd/mm/yyy/hh/MM)
1 ace 114 15.03.2019 15:26
2 bert 197 18.03.2019 07:28
3 vance 245 16.03.2019 14:03
4 chad 116 17.03.2019 02:02
5 chad 116 18.03.2019 18:23
6 ace 114 12.03.2019 23:15
Ordering the dataframe works and selecting the duplicated lines also works, but not in combination, which leads to the originals not being the first presentation. Even if I order the dataframe before marking the represenation the dataframe is seems to be unordered for the next command and linking the two commands with %>% is not working.
df %>% arrange(Datetime)
df$representations <- if_else(duplicated(df$number, .keep_all =TRUE), 1, 0)
df$represntations <- df %>%
arrange(Datetime) %>%
if_else(duplicated(df$number, .keep_all =TRUE), 1, 0)
How can i be sure, that the the originals will be the first datapoint to the number (like this)?
ID Name Number Datetime (dd/mm/yyy/hh/MM) representation
1 ace 114 15.03.2019 15:26 1
2 bert 197 18.03.2019 07:28 0
3 vance 245 16.03.2019 14:03 0
4 chad 116 17.03.2019 02:02 0
5 chad 116 18.03.2019 18:23 1
6 ace 114 12.03.2019 23:15 0
CodePudding user response:
Try the below code
df <- df %>%
arrange(Datetime) %>%
mutate(representations = if_else(duplicated(number, .keep_all =TRUE), 1, 0)) %>%
arrange(ID)
CodePudding user response:
library(dplyr)
df %>%
arrange(`Datetime(dd/mm/yyy/hh/MM)`) %>%
mutate(flag = duplicated(Number)*1) %>%
arrange(ID)
1 ace 114 15.03.2019 1
2 2 bert 197 18.03.2019 0
3 3 vance 245 16.03.2019 0
4 4 chad 116 17.03.2019 0
5 5 chad 116 18.03.2019 1
6 6 ace 114 12.03.2019 0
CodePudding user response:
I ended up using this code and the sample I checked seemed to be correct, thank you! (even though the as.Date changed the year from 2019 to 2020, but the order is correct)
# split time and date, so as.Date can be used
emerge$date <- as.Date(sapply(strsplit(as.character(emerge$Falleinzeitdatum.Notfall), " "), "[", 1), format = "%d.%m.%y")
# arrange as proposed
emerge <- emerge %>%
arrange(date) %>%
mutate(re = if_else(duplicated(Patientennummer, .keep_all = TRUE), 1, 0))