Home > front end >  Python Pandas groupby and iloc
Python Pandas groupby and iloc

Time:05-13

I have a dataframe contains data column, customer column and size like this:

Customer Date Size
Cust1 20/10/2021 4
Cust1 21/10/2021 5
Cust1 22/10/2021 6
Cust1 23/10/2021 6
Cust2 20/10/2021 4
Cust2 21/10/2021 5
Cust2 22/10/2021 6
Cust2 23/10/2021 6
Cust3 20/10/2021 4
Cust3 21/10/2021 5
Cust3 22/10/2021 6
Cust3 23/10/2021 6

I need to extract every nth date from a customer and delete the rest. In this example - every 2nd:

|Customer|Date|Size|
|--------|----|----|
|Cust1   |20/10/2021|4|
|Cust1   |22/10/2021|6|
|Cust2   |20/10/2021|4|
|Cust2   |22/10/2021|6|
|Cust3   |20/10/2021|4|
|Cust3   |22/10/2021|6|

Sorry for the bad format, but table formatting doesn't work for the 2nd table.

In reality it's every 10th and day, starting from the most recent one. Tried with group and iloc but isn't working:

df_10 = df.iloc[::10, :]

AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'

I don't insist to use groupby at all to be honest, but can't find working solution so far.

Thank you

CodePudding user response:

You can use:

df.loc[df.groupby('Customer').cumcount().mod(2).eq(0)]

output:

   Customer        Date  Size
0     Cust1  20/10/2021     4
2     Cust1  22/10/2021     6
4     Cust2  20/10/2021     4
6     Cust2  22/10/2021     6
8     Cust3  20/10/2021     4
10    Cust3  22/10/2021     6

explanation:

df.groupby('Customer').cumcount() creates a count per group ([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]), then mod(2) takes the modulo to give [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], and eq(0) enables to select the 0 values.

If you want to take every N rows, starting from the K's one (first being 0): .mod(N).eq(K)

CodePudding user response:

Perhaps this can help? Mark every Nth row per group using pandas

With a dataset that resembles yours one solution could be:

df= pd.DataFrame({'Customer': ['Cust1']*20   ['Cust2']*20,
         'date': range(1,40 1),
         'size':[10]*40})
n=3 # for every 3th row
df.groupby('Customer', group_keys=False).apply(lambda x: x.iloc[n-1::n])

´´´

  • Related