I have a df column where each cell contais a dictionary, so when I apply value_counts to this column, I obviolsy get the results for the number of occurrencies of each dictionary. But what I need is to get the number of occirrences of the separate values. a column cells look something like this:
col1
1 [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The
Smiths'}]
2 [{'name': 'Mark'}, {'name': 'Barbara}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}]
So basically what I need as result is how many Susans, Johns etc. there are in the entire columns
Any help will be appreciated
CodePudding user response:
The data is actually a list of dictionaries on each line. You can build a dataframe from each row. Then the names are contained in a column which can be converted to a list, exploded and then perform a value_counts
on that:
df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
print(df)
col1
0 [{'name': 'John'}, {'name': 'Mark'}, {'name': ...
1 [{'name': 'Mark'}, {'name': 'Barbara'}, {'name...
df.apply(lambda x: pd.DataFrame(x['col1']).squeeze().to_list(), axis=1).explode().value_counts()
John 2
Mark 2
Susan 1
Mr.Bean 1
The Smiths 1
Barbara 1
Poly 1
Nick 1
CodePudding user response:
You can try this, using @jch setup:
df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
pd.DataFrame(df['col1'].to_list()).stack().str['name'].value_counts()
Output:
John 2
Mark 2
Susan 1
Mr.Bean 1
The Smiths 1
Barbara 1
Poly 1
Nick 1
dtype: int64
Let's use pandas DataFrame constructor, stack
to reshape to single column, then using the selector from .str
accessor to get the values from dictionaries and lastly value_counts
.