how to slice/index a df-CodePudding

need to filter/eliminate the first #n row till were is present "nan" symbol from a much bigger df like df2 show.

main_df = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000],
    'Pappa':[np.nan,np.nan,"2","ai"],
    'Puppo':["Glob","Java","n","Godo"],
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(main_df,index=index_labels2)

I tryed with :

  maino_df = main_df.loc[:, (main_df.iloc [0] != np.nan) & ((main_df.iloc [0,:] < 1000))]

to obtain:

main_dfnew = {
        'Courses':["Python","Go"],
        'Discount':[1200,2000],
        'Pappa':["2","ai"],
        'Puppo':["n","Godo"],
                  }
    index_labels2=['r3','r5']
    df2 = pd.DataFrame(main_dfnew, index=index_labels2)

but eliminate also the columns where is nan

CodePudding user response：

IIUC, you want to drop the first row where you have NaNs, and keep all the rows after the first row that has no NaNs?

NB. I am assuming real NaNs here, if not first use replace or other method to convert to NaN, or comparison to match the data to consider invalid

You could use:

df3 = df2[df2.notna().all(1).cummax()]

output:

   Courses  Discount Pappa Puppo
r3  Python      1200     2     n
r5      Go      2000    ai  Godo

If you just want to remove all the rows with NaNs, use dropna:

df3 = df2.dropna(axis=0)