Home > Back-end >  Drop entire df where last row is nan
Drop entire df where last row is nan

Time:10-13

I want to remove data from all rows and columns if the last row in a specified column is NaN. At present, I'm only deleting the last row from all columns.

With below, if the last row in Volume is NaN, then I want to delete all rows from every column. If NaN appears anywhere else in the Volume, then leave as is.

df1 displays the last row as NaN, while df2 contains NaN elsewhere in the column. I only want to remove all rows based on the last row only.

Additionally, if there are no NaN's, then leave as is.

df1 = pd.DataFrame({'Datetime' : ['2022-10-10 01:44:00','2022-10-10 01:45:00','2022-10-10 01:46:00','2022-10-10 01:47:00','2022-10-10 01:48:00'],
                   'Val' : [0.97,0.97,0.95,0.93,0.96],
                   'Volume' : [0,0,np.NaN,0,np.NaN],
                  })

df2 = pd.DataFrame({'Datetime' : ['2022-10-10 01:44:00','2022-10-10 01:45:00','2022-10-10 01:46:00','2022-10-10 01:47:00','2022-10-10 01:48:00'],
                   'Val' : [0.97,0.97,0.95,0.93,0.96],
                   'Volume' : [0,0,np.NaN,0,0.0,
                  })

df1 = df1.loc[:df1['Volume'].last_valid_index()]

df2 = df2.loc[:df2['Volume'].last_valid_index()]

df1 out:

              Datetime   Val  Volume
0  2022-10-10 01:44:00  0.97     0.0
1  2022-10-10 01:45:00  0.97     0.0
2  2022-10-10 01:46:00  0.95     NaN
3  2022-10-10 01:47:00  0.93     0.0

df1 intended output:

Empty DataFrame
Columns: []
Index: []

df2 out:

              Datetime   Val  Volume
0  2022-10-10 01:44:00  0.97     0.0
1  2022-10-10 01:45:00  0.97     0.0
2  2022-10-10 01:46:00  0.95     NaN
3  2022-10-10 01:47:00  0.93     0.0
4  2022-10-10 01:48:00  0.96     0.0

df2 intended out:

              Datetime   Val  Volume
0  2022-10-10 01:44:00  0.97     0.0
1  2022-10-10 01:45:00  0.97     0.0
2  2022-10-10 01:46:00  0.95     NaN
3  2022-10-10 01:47:00  0.93     0.0
4  2022-10-10 01:48:00  0.96     0.0

CodePudding user response:

Select last value of column Volume by Series.iat and test missing value by scalar by isna, if True create empty DataFrame:

df = pd.DataFrame() if pd.isna(df['Volume'].iat[-1]) else df
print (df)
Empty DataFrame
Columns: []
Index: []

CodePudding user response:

You can wrap the logic in functions if you want to generalise the functionality and keep the main code simple.

def make_final_column_entry_isna_check(column_name):
    def final_column_entry_isna_check(dataframe):
        return dataframe.tail(1)[column_name].isna().bool()
    return final_column_entry_isna_check

# Create a function which will check the 'Volume' column
volume_data_check = make_final_column_entry_isna_check('Volume')

def validated_dataframe(dataframe):
    return pd.DataFrame() if volume_data_check(dataframe) else dataframe

# Main code
df = validated_dataframe(df)

df2 = validated_dataframe(df2)
  • Related