If I have a dataframe named df like so..
____________________
| id | name | age |
|____________________|
| 0123 | Joe | 20 |
|____________________|
| 0123 | Kyle | 45 |
|____________________|
| 0333 | Susan | 24 |
|____________________|
| 0333 | Molly | 80 |
|____________________|
How can I split this df into two so that neither df has any duplicate id values. Hence, I am looking for them to be like so...
____________________
| id | name | age |
|____________________|
| 0123 | Joe | 20 |
|____________________|
| 0333 | Susan | 24 |
|____________________|
____________________
| id | name | age |
|____________________|
| 0333 | Molly | 80 |
|____________________|
| 0123 | Kyle | 45 |
|____________________|
Let me know if you can help!
CodePudding user response:
You can use split
with a factor on the sequence of ID's:
split(df,~ave(id, id, FUN = seq_along))
$`1`
id name age
1 0123 Joe 20
3 0333 Susan 24
$`2`
id name age
2 0123 Kyle 45
4 0333 Molly 80
This does not matter how many duplicates you have. All the duplicates will belong to a unique dataframe:
split(rbind(df, df),~ave(id, id, FUN = seq_along))
$`1`
id name age
1 0123 Joe 20
3 0333 Susan 24
$`2`
id name age
2 0123 Kyle 45
4 0333 Molly 80
$`3`
id name age
5 0123 Joe 20
7 0333 Susan 24
$`4`
id name age
6 0123 Kyle 45
8 0333 Molly 80
CodePudding user response:
a_list <- split(df, duplicated(df$id))
This will work only if each id
occurs maximum twice.
CodePudding user response:
Here is a dplyr solution:
df1 <- df %>%
distinct(id, .keep_all = TRUE)
df2 <- anti_join(df, df1)
> df1
id name age
1 123 Joe 20
2 333 Susan 24
> df2
id name age
1 123 Kyle 45
2 333 Molly 80