Home > Software engineering >  how to slice/index a df
how to slice/index a df

Time:02-15

need to filter/eliminate the first #n row till were is present "nan" symbol from a much bigger df like df2 show.

main_df = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000],
    'Pappa':[np.nan,np.nan,"2","ai"],
    'Puppo':["Glob","Java","n","Godo"],
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(main_df,index=index_labels2)

I tryed with :

  maino_df = main_df.loc[:, (main_df.iloc [0] != np.nan) & ((main_df.iloc [0,:] < 1000))]

to obtain:

main_dfnew = {
        'Courses':["Python","Go"],
        'Discount':[1200,2000],
        'Pappa':["2","ai"],
        'Puppo':["n","Godo"],
                  }
    index_labels2=['r3','r5']
    df2 = pd.DataFrame(main_dfnew, index=index_labels2)

but eliminate also the columns where is nan

CodePudding user response:

IIUC, you want to drop the first row where you have NaNs, and keep all the rows after the first row that has no NaNs?

NB. I am assuming real NaNs here, if not first use replace or other method to convert to NaN, or comparison to match the data to consider invalid

You could use:

df3 = df2[df2.notna().all(1).cummax()]

output:

   Courses  Discount Pappa Puppo
r3  Python      1200     2     n
r5      Go      2000    ai  Godo

If you just want to remove all the rows with NaNs, use dropna:

df3 = df2.dropna(axis=0)
  • Related