Home > Software engineering >  Select n rows with unique combinations of certain rows
Select n rows with unique combinations of certain rows

Time:01-16

I have a dataframe with columns A, B, C and D. A is the coarsest with many rows having the same A value, B is finer, C is further so, and so on. I want to select n rows with unique A-value, B-value combinations.

    A    B    C    D
0   a1   b1   c1   d1
1   a1   b1   c2   d2
2   a1   b2   c3   d3
3   a1   b2   c4   d4
4   a2   b3   c5   d5
5   a2   b3   c6   d6
6   a2   b4   c7   d7
7   a2   b4   c8   d8

In the above example, I want to select rows such that 1 have 1 row for each unique A-value B-value combination.

An example of such a selection would be

    A    B    C    D
0   a1   b1   c1   d1
2   a1   b2   c3   d3
4   a2   b3   c5   d5
6   a2   b4   c7   d7

How do I do this elegantly with pandas?

CodePudding user response:

You could group, then use .head():

df.groupby(['A', 'B']).head(1)
    A   B   C   D
0  a1  b1  c1  d1
2  a1  b2  c3  d3
4  a2  b3  c5  d5
6  a2  b4  c7  d7

CodePudding user response:

You could simply do a drop_duplicates on the subset of columns of A and B, depending on defining the keep argument you always get first or last unique row of a combination. Or you groupby your data with groupby.sample to get a random row of each group.

df.drop_duplicates(subset=['A', 'B'])

#or

df.groupby(['A', 'B']).sample()
  • Related