Home > Net >  pandas, for each unique value in a column, get first three rows in which examples of each unique val
pandas, for each unique value in a column, get first three rows in which examples of each unique val

Time:05-10

I have a dataframe with two columns (more but only two of interest) such as below:

Column 1 Column 2
Doesn't A1
Matter A2
What A1
Is A2
In A3
This A2
Column A4
But A2
I A1
Do A2
Need A3
It A2
In A4
Output A5

And I simply want to get a list of all unique values in column 2 - however - I want 3 instances/examples of each value (it doesn't have to be first 3 but first 3 will work just as fine).

I know how to get unique elements of a dataframe obviously with .unique() method but not 3 repeats of each unique value (up to 3 so if only 1 or 2 exists - I want to get just get those).

And I want to get

Column 1 Column 2
Doesn't A1
What A1
I A1
Matter A2
Is A2
This A2
In A3
Need A3
Column A4
In A4
Output A5

The order of column 2 does not matter. I do want to get the unique terms in adjacent rows (grouped together) but there is no sequential order in between groups that matters. So the Set of A3 could come before the set of A1 rows.

CodePudding user response:

You can use grouping (groupby manual) and then take only n first elements of each group (groupby.head manual).

Let's create the data:

data = {'Column 1': ['Doesn\'t', 'Matter', 'What', 'Is', 'In', 'This', 'Column', 'But', 'I', 'Do', 'Need', 'It', 'In', 'Output'], 'Column 2': ['A1', 'A2', 'A1', 'A2', 'A3', 'A2', 'A4', 'A2', 'A1', 'A2', 'A3', 'A2', 'A4', 'A5']}
df = pd.DataFrame(data)

Group it, take first 3 elements of each group, sort by Column 2 values:

df.groupby(by='Column 2').head(3).sort_values(by='Column 2')

CodePudding user response:

You can use the rank function to rank rows by partitioning on "Column 2" values. Filtering on rank values less than 4 for each partition may get what you need:

df.loc[df.groupby(['Column 2'])['Column 2'].rank(method='first')<=3]

Input sample used for testing:

df = pd.DataFrame([
     ["Doesn't", "A1"],
     ["Matter" , "A2"],
     ["What"   , "A1"],
     ["Is"     , "A2"],
     ["In"     , "A3"],
     ["This"   , "A2"],
     ["Column" , "A4"],
     ["But"    , "A2"],
     ["I"      , "A1"],
     ["Do"     , "A2"],
     ["Need"   , "A3"],
     ["It"     , "A2"],
     ["In"     , "A4"],
     ["Output" , "A5"],
], columns = ["Column 1", "Column 2"])
  • Related