Home > Back-end >  Extracting specific data from list format data extracted from dictionary format read from Excel file
Extracting specific data from list format data extracted from dictionary format read from Excel file

Time:09-19

I am pulling specific data from data in nested dictionary format. As a result of selecting specific data as the first step in the dictionary format, the following list format data was obtained. The data type of each row is a list.

dataset 

0                                                      []
1                                                      []
2                                                      []
3                [{'A': 1, 'B': 2, 'C': 'information1'}]
4       [{'A': 3, 'B': 4, 'C': 'information2'}, {'...

type(dataset[0])
=> list
type(dataset)
=> pandas.core.series.Series

I am trying to extract specific data ('C') from here again. When using the following code for each row, I can successfully pull specific data.

[d['C'] for d in dict_test2[1]]
=> ['information1']

However, since the data is over 40k, if I create and execute the following method, I see an error message.

def get_c(d):
    return [d['C'] for d in dataset]
dataset2 = dataset.apply(get_c)

=> TypeError: list indices must be integers or slices, not str

Any help would be greatly appreciated.

CodePudding user response:

Since the column values are python list, you can explode the column, and get the dictionary values using Series.str['key'] then dropna and finally call tolist() to get the values as list:

>>> df[0].explode().str['C'].dropna().to_list()

['information1', 'information2']
  • Related