I have a df column where each cell contais a dictionary, so when I apply value_counts to this column, I obviolsy get the results for the number of occurrencies of each dictionary. But what I need is to get the number of occirrences of the separate values. a column cells look something like this:
col1
1 [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The
Smiths'}]
2 [{'name': 'Mark'}, {'name': 'Barbara}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}]
So basically what I need as result is how many Susans, Johns etc. there are in the entire columns
Any help will be appreciated
CodePudding user response:
You can try this, using @jch setup:
df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
pd.DataFrame(df['col1'].to_list()).stack().str['name'].value_counts()
Output:
John 2
Mark 2
Susan 1
Mr.Bean 1
The Smiths 1
Barbara 1
Poly 1
Nick 1
dtype: int64
Let's use pandas DataFrame constructor, stack
to reshape to single column, then using the selector from .str
accessor to get the values from dictionaries and lastly value_counts
.
CodePudding user response:
The data is actually a list of dictionaries on each line. You can build a dataframe from each row. Then the names are contained in a column which can be converted to a list, exploded and then perform a value_counts
on that:
df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
print(df)
Output:
col1
0 [{'name': 'John'}, {'name': 'Mark'}, {'name': ...
1 [{'name': 'Mark'}, {'name': 'Barbara'}, {'name...
value_count :
df.apply(lambda x: pd.DataFrame(x['col1']).squeeze().to_list(), axis=1).explode().value_counts()
Output :
John 2
Mark 2
Susan 1
Mr.Bean 1
The Smiths 1
Barbara 1
Poly 1
Nick 1
CodePudding user response:
We can use explode()
function to transform each element of a list-like to a row, replicating the index values. Then we can use json_normalize
to convert each key in the dictionary to transform it to a column. Then value_counts()
can be used to count each unique value in the dataFrame.
df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
print(pd.json_normalize(df.col1.explode()).value_counts())
Result :
name
John 2
Mark 2
Barbara 1
Mr.Bean 1
Nick 1
Poly 1
Susan 1
The Smiths 1
If you want to get the count of any one name, say John
name = 'John'
print(pd.json_normalize(df.col1.explode()).eq(name).sum())
Result :
2