I have a dataframe of several columns i need to filter by one column value (let's call it col1
) but i need to pick the row that has the least value in another column (e.g., col2
). I know how to take distinct rows by a column value (basically, dplyr's distinct(col1)
), but I'm not sure how it behaves when choosing the line to return from multiple lines, and I don't know how to guide it.
For example, what I need is given this dataframe:
col1 col2
a 10
b 12
a 8
b 14
a 15
c 6
a 3
return the unique lines by col1 that have the least value in col2, i.e.:
col1 col2
a 3
b 12
c 6
CodePudding user response:
You can try the code below
> aggregate(. ~ col1, df, min)
col1 col2
1 a 3
2 b 12
3 c 6
CodePudding user response:
Using dplyr
the solution is to group by your column and then keep only rows with the minimum value in the other column:
df %>% group_by(col1) %>% filter(col2 == min(col2))