I want to remove data from all rows and columns if the last row in a specified column is NaN. At present, I'm only deleting the last row from all columns.
With below, if the last row in Volume
is NaN, then I want to delete all rows from every column. If NaN appears anywhere else in the Volume
, then leave as is.
df1
displays the last row as NaN, while df2
contains NaN elsewhere in the column. I only want to remove all rows based on the last row only.
Additionally, if there are no NaN's, then leave as is.
df1 = pd.DataFrame({'Datetime' : ['2022-10-10 01:44:00','2022-10-10 01:45:00','2022-10-10 01:46:00','2022-10-10 01:47:00','2022-10-10 01:48:00'],
'Val' : [0.97,0.97,0.95,0.93,0.96],
'Volume' : [0,0,np.NaN,0,np.NaN],
})
df2 = pd.DataFrame({'Datetime' : ['2022-10-10 01:44:00','2022-10-10 01:45:00','2022-10-10 01:46:00','2022-10-10 01:47:00','2022-10-10 01:48:00'],
'Val' : [0.97,0.97,0.95,0.93,0.96],
'Volume' : [0,0,np.NaN,0,0.0,
})
df1 = df1.loc[:df1['Volume'].last_valid_index()]
df2 = df2.loc[:df2['Volume'].last_valid_index()]
df1 out:
Datetime Val Volume
0 2022-10-10 01:44:00 0.97 0.0
1 2022-10-10 01:45:00 0.97 0.0
2 2022-10-10 01:46:00 0.95 NaN
3 2022-10-10 01:47:00 0.93 0.0
df1 intended output:
Empty DataFrame
Columns: []
Index: []
df2 out:
Datetime Val Volume
0 2022-10-10 01:44:00 0.97 0.0
1 2022-10-10 01:45:00 0.97 0.0
2 2022-10-10 01:46:00 0.95 NaN
3 2022-10-10 01:47:00 0.93 0.0
4 2022-10-10 01:48:00 0.96 0.0
df2 intended out:
Datetime Val Volume
0 2022-10-10 01:44:00 0.97 0.0
1 2022-10-10 01:45:00 0.97 0.0
2 2022-10-10 01:46:00 0.95 NaN
3 2022-10-10 01:47:00 0.93 0.0
4 2022-10-10 01:48:00 0.96 0.0
CodePudding user response:
Select last value of column Volume
by Series.iat
and test missing value by scalar by isna
, if True create empty DataFrame:
df = pd.DataFrame() if pd.isna(df['Volume'].iat[-1]) else df
print (df)
Empty DataFrame
Columns: []
Index: []
CodePudding user response:
You can wrap the logic in functions if you want to generalise the functionality and keep the main code simple.
def make_final_column_entry_isna_check(column_name):
def final_column_entry_isna_check(dataframe):
return dataframe.tail(1)[column_name].isna().bool()
return final_column_entry_isna_check
# Create a function which will check the 'Volume' column
volume_data_check = make_final_column_entry_isna_check('Volume')
def validated_dataframe(dataframe):
return pd.DataFrame() if volume_data_check(dataframe) else dataframe
# Main code
df = validated_dataframe(df)
df2 = validated_dataframe(df2)