Here is my original Table
A Day-1
A Day-2
A Day-5
B Day-3
B Day-7
I would like to only keep the last several days. For example, if I would like to keep last 4 days, then the table becomes
A Day-2
A Day-5
B Day-7
For 'A', the last day is Day-5, and last 4 days would be Day2, Day3, Day4, Day5. Therefore, it's 'A Day-2' and 'A Day-5'
For 'B', the last day is Day-7, and last 4 days would be Day4, Day5, Day6, Day7. Therefore, it's 'B Day-7'
Here is what I tried:
df.groupby(['name']).tail(1)
And it shows the last day of each group, but how to filter date time according to this new dataframe?
A Day-5
B Day-7
CodePudding user response:
Extract the integer day value from the column then group by name
and transform
with last
to broadcast the last day per name
then compare current day with the last day to check where the difference is greater than 4
day = df['day'].str.split('-').str[-1].astype(int)
mask = day > (day.groupby(df['name']).transform('last') - 4)
df[mask]
Result
name day
1 A Day-2
2 A Day-5
4 B Day-7
CodePudding user response:
Example
data = {'col1': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
'col2': {0: 'Day-1', 1: 'Day-2', 2: 'Day-5', 3: 'Day-3', 4: 'Day-7'}}
df = pd.DataFrame(data)
df
col1 col2
0 A Day-1
1 A Day-2
2 A Day-5
3 B Day-3
4 B Day-7
Code
s = df['col2'].str.split('-').str[-1].astype('int')
cond1 = s.groupby(df['col1']).transform(lambda x: x.gt(x.max()-4))
df[cond1]
result:
col1 col2
1 A Day-2
2 A Day-5
4 B Day-7