I have converted data frame rows to lists and in those list there are NaN values which I would like to remove. This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)
CodePudding user response:
It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN
value in pandas.
You can use numpy library, to compare it with NaN
:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN
, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)
CodePudding user response:
You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.