Home > Software design >  Skip nan values when reading excel as dict using pandas
Skip nan values when reading excel as dict using pandas

Time:09-29

I have an excel table with a sheet whose content is as follows:

A     B        C        D       
1     039#     AA8      MA21   
2     088#              MA22   
3     077#              MA23      
4                       MA24   
5                    

I can correctly read the table as follows:

>>> df = pd.read_excel(xlsx_path)
>>> df      
       A     B    C     D
    0  1  039#  AA8  MA21
    1  2  088#  nan  MA22
    2  3  077#  nan  MA23
    3  4   nan  nan  MA24
    4  5   nan  nan   nan

What I really want is to convert this table into a dictionary of lists where each list contains only the values ​​from cells filled in the excel sheet, however, when using the to_dict method, what I get is a list for each column with values unwanted nan:

>>> df.to_dict('list')
    {'A': [1, 2, 3, 4, 5],
     'B': ['039#', '088#', '077#', 'nan', 'nan'],
     'C': ['AA8', 'nan', 'nan', 'nan', 'nan'],
     'D': ['MA21', 'MA22', 'MA23', 'MA24', 'nan']}

What I want to get is the following:

>>> df.to_dict('list')
    {'A': [1, 2, 3, 4, 5],
     'B': ['039#', '088#', '077#'],
     'C': ['AA8'],
     'D': ['MA21', 'MA22', 'MA23', 'MA24']}

Of course once I get the dict I could iterate through each list filtering out the nan values, however I would like to know if there is a better approach that is closer to the example above.

CodePudding user response:

I'm not currently in an appropriate place to do a test, but if the need arises, be quick. You are trying to change to dict, but pandas it works by indexing, so when you perform the excel reading, it transforms into dataframe of rows and columns, the values that don't exist will be replaced by nan, when you transform that to dict, the The function is not performing wrong, it will work according to the number of lines, keeping the nans, for you to remove it, you will have to manually access the dictionary and somehow manipulate your list to drop the missing values. Sorry if I can't send a function to help right now.

CodePudding user response:

you can mask NaN

import pandas as pd
import numpy as np
df = pd.read_excel(xlsx_path)
df = df.replace(r'^\s $', np.nan, regex=True).dropna(how='all')
df.to_dict('list')

or else if you just want to ignore. Ref check all options here

import pandas as pd
import numpy as np
df = pd.read_excel(xlsxpath,na_filter=False)

df.to_dict('list')
  • Related