I have a dataframe with columns A, B, C and D. A is the coarsest with many rows having the same A value, B is finer, C is further so, and so on. I want to select n rows with unique A-value, B-value combinations.
A B C D
0 a1 b1 c1 d1
1 a1 b1 c2 d2
2 a1 b2 c3 d3
3 a1 b2 c4 d4
4 a2 b3 c5 d5
5 a2 b3 c6 d6
6 a2 b4 c7 d7
7 a2 b4 c8 d8
In the above example, I want to select rows such that 1 have 1 row for each unique A-value B-value combination.
An example of such a selection would be
A B C D
0 a1 b1 c1 d1
2 a1 b2 c3 d3
4 a2 b3 c5 d5
6 a2 b4 c7 d7
How do I do this elegantly with pandas?
CodePudding user response:
You could group, then use .head()
:
df.groupby(['A', 'B']).head(1)
A B C D
0 a1 b1 c1 d1
2 a1 b2 c3 d3
4 a2 b3 c5 d5
6 a2 b4 c7 d7
CodePudding user response:
You could simply do a drop_duplicates
on the subset of columns of A
and B
, depending on defining the keep
argument you always get first or last unique row of a combination. Or you groupby your data with groupby.sample
to get a random row of each group.
df.drop_duplicates(subset=['A', 'B'])
#or
df.groupby(['A', 'B']).sample()