Home > Software engineering >  Drop row if value date is not a max date
Drop row if value date is not a max date

Time:06-02

I have a table looking similarly to the one below. I want to be able to drop row when current value date is not the max date in the table. In this case the row where current value is not null should be dropped because the date is not the max date.

date history current
6/1/2019 0 NULL
9/1/2019 0 NULL
12/1/2019 0 NULL
3/1/2020 0 NULL
6/1/2020 470.006 NULL
9/1/2020 248.028 NULL
12/1/2020 246.276 NULL
3/1/2021 NULL 273.7215714
4/1/2021 269.182 NULL
7/1/2021 297.919 NULL
10/1/2021 312.68 NULL
1/1/2022 300.932 NULL
4/1/2022 340.767 NULL

CodePudding user response:

If your DataFrame is called df, then you can keep only the max date values by using a mask:

df = df[df['date'] == df['date'].max()]

Here you are saying that you only want rows from df where the value in the column df['date'] is equal to the max value from df['date'].

If you only want this where the 'current' column is not NULL, then you can do the following:

df = df[(df['current'] == 'NULL') | (df['date'] == df['date'].max())]

CodePudding user response:

You could do:

df = df[(df['current'].isnull()) | (df['date'] == df['date'].max())]

CodePudding user response:

Not sure if I understand what you want to do, but here's my take.

Convert your dates to pandas datetime objects and replace NULL values with NaN.

import numpy as np
import pandas as pd

df = df.replace('NULL', np.nan)
df.date = pd.to_datetime(df.date)

Then you can do:

df.query('not (~current.isna() and date != date.max())')

This drops the row with current value 273.7215714 since its date is not the maximum date. If you add another row with the maximum date (2022-04-01) and any non-null current value, it should be kept.

The resulting table looks like this (row 7 from original df dropped):

date history current
0 2019-06-01 00:00:00 0 nan
1 2019-09-01 00:00:00 0 nan
2 2019-12-01 00:00:00 0 nan
3 2020-03-01 00:00:00 0 nan
4 2020-06-01 00:00:00 470.006 nan
5 2020-09-01 00:00:00 248.028 nan
6 2020-12-01 00:00:00 246.276 nan
8 2021-04-01 00:00:00 269.182 nan
9 2021-07-01 00:00:00 297.919 nan
10 2021-10-01 00:00:00 312.68 nan
11 2022-01-01 00:00:00 300.932 nan
12 2022-04-01 00:00:00 340.767 nan
  • Related