Python Pandas groupby and iloc-CodePudding

I have a dataframe contains data column, customer column and size like this:

Customer	Date	Size
Cust1	20/10/2021	4
Cust1	21/10/2021	5
Cust1	22/10/2021	6
Cust1	23/10/2021	6
Cust2	20/10/2021	4
Cust2	21/10/2021	5
Cust2	22/10/2021	6
Cust2	23/10/2021	6
Cust3	20/10/2021	4
Cust3	21/10/2021	5
Cust3	22/10/2021	6
Cust3	23/10/2021	6

I need to extract every nth date from a customer and delete the rest. In this example - every 2nd:

|Customer|Date|Size|
|--------|----|----|
|Cust1   |20/10/2021|4|
|Cust1   |22/10/2021|6|
|Cust2   |20/10/2021|4|
|Cust2   |22/10/2021|6|
|Cust3   |20/10/2021|4|
|Cust3   |22/10/2021|6|

Sorry for the bad format, but table formatting doesn't work for the 2nd table.

In reality it's every 10th and day, starting from the most recent one. Tried with group and iloc but isn't working:

df_10 = df.iloc[::10, :]

AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'

I don't insist to use groupby at all to be honest, but can't find working solution so far.

Thank you

CodePudding user response：

You can use:

df.loc[df.groupby('Customer').cumcount().mod(2).eq(0)]

output:

   Customer        Date  Size
0     Cust1  20/10/2021     4
2     Cust1  22/10/2021     6
4     Cust2  20/10/2021     4
6     Cust2  22/10/2021     6
8     Cust3  20/10/2021     4
10    Cust3  22/10/2021     6

explanation:

df.groupby('Customer').cumcount() creates a count per group ([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]), then mod(2) takes the modulo to give [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], and eq(0) enables to select the 0 values.

If you want to take every N rows, starting from the K's one (first being 0): .mod(N).eq(K)

CodePudding user response：

Perhaps this can help? Mark every Nth row per group using pandas

With a dataset that resembles yours one solution could be:

df= pd.DataFrame({'Customer': ['Cust1']*20   ['Cust2']*20,
         'date': range(1,40 1),
         'size':[10]*40})
n=3 # for every 3th row
df.groupby('Customer', group_keys=False).apply(lambda x: x.iloc[n-1::n])

´´´