I have an excel table with a sheet whose content is as follows:
A B C D
1 039# AA8 MA21
2 088# MA22
3 077# MA23
4 MA24
5
I can correctly read the table as follows:
>>> df = pd.read_excel(xlsx_path)
>>> df
A B C D
0 1 039# AA8 MA21
1 2 088# nan MA22
2 3 077# nan MA23
3 4 nan nan MA24
4 5 nan nan nan
What I really want is to convert this table into a dictionary of lists where each list contains only the values from cells filled in the excel sheet, however, when using the to_dict
method, what I get is a list for each column with values unwanted nan
:
>>> df.to_dict('list')
{'A': [1, 2, 3, 4, 5],
'B': ['039#', '088#', '077#', 'nan', 'nan'],
'C': ['AA8', 'nan', 'nan', 'nan', 'nan'],
'D': ['MA21', 'MA22', 'MA23', 'MA24', 'nan']}
What I want to get is the following:
>>> df.to_dict('list')
{'A': [1, 2, 3, 4, 5],
'B': ['039#', '088#', '077#'],
'C': ['AA8'],
'D': ['MA21', 'MA22', 'MA23', 'MA24']}
Of course once I get the dict I could iterate through each list filtering out the nan
values, however I would like to know if there is a better approach that is closer to the example above.
CodePudding user response:
I'm not currently in an appropriate place to do a test, but if the need arises, be quick. You are trying to change to dict, but pandas it works by indexing, so when you perform the excel reading, it transforms into dataframe of rows and columns, the values that don't exist will be replaced by nan, when you transform that to dict, the The function is not performing wrong, it will work according to the number of lines, keeping the nans, for you to remove it, you will have to manually access the dictionary and somehow manipulate your list to drop the missing values. Sorry if I can't send a function to help right now.
CodePudding user response:
you can mask NaN
import pandas as pd
import numpy as np
df = pd.read_excel(xlsx_path)
df = df.replace(r'^\s $', np.nan, regex=True).dropna(how='all')
df.to_dict('list')
or else if you just want to ignore. Ref check all options here
import pandas as pd
import numpy as np
df = pd.read_excel(xlsxpath,na_filter=False)
df.to_dict('list')