I am pulling specific data from data in nested dictionary format. As a result of selecting specific data as the first step in the dictionary format, the following list format data was obtained. The data type of each row is a list.
dataset
0 []
1 []
2 []
3 [{'A': 1, 'B': 2, 'C': 'information1'}]
4 [{'A': 3, 'B': 4, 'C': 'information2'}, {'...
type(dataset[0])
=> list
type(dataset)
=> pandas.core.series.Series
I am trying to extract specific data ('C') from here again. When using the following code for each row, I can successfully pull specific data.
[d['C'] for d in dict_test2[1]]
=> ['information1']
However, since the data is over 40k, if I create and execute the following method, I see an error message.
def get_c(d):
return [d['C'] for d in dataset]
dataset2 = dataset.apply(get_c)
=> TypeError: list indices must be integers or slices, not str
Any help would be greatly appreciated.
CodePudding user response:
Since the column values are python list, you can explode the column, and get the dictionary values using Series.str['key']
then dropna
and finally call tolist()
to get the values as list:
>>> df[0].explode().str['C'].dropna().to_list()
['information1', 'information2']