need to filter/eliminate the first #n row till were is present "nan" symbol from a much bigger df like df2 show.
main_df = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000],
'Pappa':[np.nan,np.nan,"2","ai"],
'Puppo':["Glob","Java","n","Godo"],
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(main_df,index=index_labels2)
I tryed with :
maino_df = main_df.loc[:, (main_df.iloc [0] != np.nan) & ((main_df.iloc [0,:] < 1000))]
to obtain:
main_dfnew = {
'Courses':["Python","Go"],
'Discount':[1200,2000],
'Pappa':["2","ai"],
'Puppo':["n","Godo"],
}
index_labels2=['r3','r5']
df2 = pd.DataFrame(main_dfnew, index=index_labels2)
but eliminate also the columns where is nan
CodePudding user response:
IIUC, you want to drop the first row where you have NaNs, and keep all the rows after the first row that has no NaNs?
NB. I am assuming real NaNs here, if not first use replace
or other method to convert to NaN, or comparison to match the data to consider invalid
You could use:
df3 = df2[df2.notna().all(1).cummax()]
output:
Courses Discount Pappa Puppo
r3 Python 1200 2 n
r5 Go 2000 ai Godo
If you just want to remove all the rows with NaNs, use dropna
:
df3 = df2.dropna(axis=0)