I am not very familiar in python pandas programming.
I have a dataframe like this:
| No | Date | hours | Flag |
|:------|:-----------|:---------|:----------|
| 101 | 13.01.2021 | 2:45 | 0 |
| 101 | 13.01.2021 | 1:20 | 0 |
| 101 | 15.02.2021 | 3:15 | 0 |
| 101 | 16.02.2021 | 0:30 | 0 |
| 101 | 02.03.2021 | 1:20 | 0 |
| 101 | 02.03.2021 | 2:40 | 0 |
| 101 | 19.06.2021 | 2:05 | 0 |
| 101 | 19.06.2021 | 1:45 | 0 |
| 101 | 19.06.2021 | 1:25 | 0 |
| 101 | 19.06.2021 | 0:45 | 0 |
Then I want to extract all rows with the two latest dates. Alternativ with the latest date, the three latest dates etc.
A dataframe (with the two latest dates) like this should be the target:
| No | Date | hours | Flag |
|:------|:-----------|:---------|:----------|
| 101 | 02.03.2021 | 1:20 | 0 |
| 101 | 02.03.2021 | 2:40 | 0 |
| 101 | 19.06.2021 | 2:05 | 0 |
| 101 | 19.06.2021 | 1:45 | 0 |
| 101 | 19.06.2021 | 1:25 | 0 |
| 101 | 19.06.2021 | 0:45 | 0 |
I tried:
df.nlargest(2, 'Date', keep='all')
this produces sth. like this:
| No | Date | hours | Flag |
|:------|:-----------|:---------|:----------|
| 101 | 19.06.2021 | 2:05 | 0 |
| 101 | 19.06.2021 | 1:45 | 0 |
| 101 | 19.06.2021 | 1:25 | 0 |
| 101 | 19.06.2021 | 0:45 | 0 |
Can anybody help me please?
Thanks.
CodePudding user response:
Filter 2 largest Datetimes by Series.nlargest
and then filter all Date
s which match by Series.isin
and boolean indexing
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df[df['Date'].isin(df['Date'].nlargest(2))]
print (df)
No Date hours Flag
6 101 2021-06-19 2:05 0
7 101 2021-06-19 1:45 0
8 101 2021-06-19 1:25 0
9 101 2021-06-19 0:45 0
CodePudding user response:
@PaulS yes this works, Thank you