I have a table looking similarly to the one below. I want to be able to drop row when current value date is not the max date in the table. In this case the row where current value is not null should be dropped because the date is not the max date.
date | history | current |
---|---|---|
6/1/2019 | 0 | NULL |
9/1/2019 | 0 | NULL |
12/1/2019 | 0 | NULL |
3/1/2020 | 0 | NULL |
6/1/2020 | 470.006 | NULL |
9/1/2020 | 248.028 | NULL |
12/1/2020 | 246.276 | NULL |
3/1/2021 | NULL | 273.7215714 |
4/1/2021 | 269.182 | NULL |
7/1/2021 | 297.919 | NULL |
10/1/2021 | 312.68 | NULL |
1/1/2022 | 300.932 | NULL |
4/1/2022 | 340.767 | NULL |
CodePudding user response:
If your DataFrame is called df
, then you can keep only the max date values by using a mask:
df = df[df['date'] == df['date'].max()]
Here you are saying that you only want rows from df
where the value in the column df['date']
is equal to the max value from df['date']
.
If you only want this where the 'current' column is not NULL
, then you can do the following:
df = df[(df['current'] == 'NULL') | (df['date'] == df['date'].max())]
CodePudding user response:
You could do:
df = df[(df['current'].isnull()) | (df['date'] == df['date'].max())]
CodePudding user response:
Not sure if I understand what you want to do, but here's my take.
Convert your dates to pandas datetime objects and replace NULL
values with NaN
.
import numpy as np
import pandas as pd
df = df.replace('NULL', np.nan)
df.date = pd.to_datetime(df.date)
Then you can do:
df.query('not (~current.isna() and date != date.max())')
This drops the row with current
value 273.7215714
since its date is not the maximum date. If you add another row with the maximum date (2022-04-01
) and any non-null current
value, it should be kept.
The resulting table looks like this (row 7 from original df dropped):
date | history | current | |
---|---|---|---|
0 | 2019-06-01 00:00:00 | 0 | nan |
1 | 2019-09-01 00:00:00 | 0 | nan |
2 | 2019-12-01 00:00:00 | 0 | nan |
3 | 2020-03-01 00:00:00 | 0 | nan |
4 | 2020-06-01 00:00:00 | 470.006 | nan |
5 | 2020-09-01 00:00:00 | 248.028 | nan |
6 | 2020-12-01 00:00:00 | 246.276 | nan |
8 | 2021-04-01 00:00:00 | 269.182 | nan |
9 | 2021-07-01 00:00:00 | 297.919 | nan |
10 | 2021-10-01 00:00:00 | 312.68 | nan |
11 | 2022-01-01 00:00:00 | 300.932 | nan |
12 | 2022-04-01 00:00:00 | 340.767 | nan |