keep the unique value for each person.
I have DF
name | size |
---|---|
john | 16 |
khaled | 15 |
john | 15 |
Alex | 16 |
john | 16 |
I need in the output to remove the duplicated value in color for each name.
name | size |
---|---|
john | 16 |
khaled | 15 |
john | 15 |
Alex | 16 |
What is the best function or library to do that?
CodePudding user response:
you can use the distinct() function from the dplyr library to remove duplicate rows from a data frame based on a specified column. For example, if you want to remove duplicate rows from the df data frame based on the color column, you could use the following code:
library(dplyr)
# Remove duplicate rows based on the "color" column
df_unique <- distinct(df, color)
This code will remove any rows from the df data frame that have duplicate values in the color column, and will return a new data frame called df_unique that only contains unique rows.
Alternatively, if you want to keep the first occurrence of each row with a duplicate value in the color column, you can use the distinct() function in combination with the arrange() function from the dplyr library. For example:
library(dplyr)
# Remove duplicate rows based on the "color" column, keeping the first occurrence of each row
df_unique <- df %>%
arrange(color) %>%
distinct(color)
This code will first arrange the rows in the df data frame by the color column, and then it will use the distinct() function to remove any rows with duplicate values in the color column. This will ensure that only the first occurrence of each row with a duplicate value in the color column is kept in the resulting data frame.