Home > Mobile >  Pandas how to filter date time after groupby?
Pandas how to filter date time after groupby?

Time:12-05

Here is my original Table

A      Day-1
A      Day-2
A      Day-5
B      Day-3
B      Day-7

I would like to only keep the last several days. For example, if I would like to keep last 4 days, then the table becomes

A      Day-2
A      Day-5
B      Day-7

For 'A', the last day is Day-5, and last 4 days would be Day2, Day3, Day4, Day5. Therefore, it's 'A Day-2' and 'A Day-5'

For 'B', the last day is Day-7, and last 4 days would be Day4, Day5, Day6, Day7. Therefore, it's 'B Day-7'

Here is what I tried:

df.groupby(['name']).tail(1)

And it shows the last day of each group, but how to filter date time according to this new dataframe?

A      Day-5
B      Day-7

CodePudding user response:

Extract the integer day value from the column then group by name and transform with last to broadcast the last day per name then compare current day with the last day to check where the difference is greater than 4

day = df['day'].str.split('-').str[-1].astype(int)
mask = day > (day.groupby(df['name']).transform('last') - 4)

df[mask]

Result

  name    day
1    A  Day-2
2    A  Day-5
4    B  Day-7

CodePudding user response:

Example

data = {'col1': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
        'col2': {0: 'Day-1', 1: 'Day-2', 2: 'Day-5', 3: 'Day-3', 4: 'Day-7'}}
df = pd.DataFrame(data)

df

    col1    col2
0   A       Day-1
1   A       Day-2
2   A       Day-5
3   B       Day-3
4   B       Day-7

Code

s = df['col2'].str.split('-').str[-1].astype('int')
cond1 = s.groupby(df['col1']).transform(lambda x: x.gt(x.max()-4))
df[cond1]

result:

    col1    col2
1   A       Day-2
2   A       Day-5
4   B       Day-7
  • Related