Home > Back-end >  Pandas : Is there a way to count the number of occurrences of values in a given column which contain
Pandas : Is there a way to count the number of occurrences of values in a given column which contain

Time:06-24

I have a df column where each cell contais a dictionary, so when I apply value_counts to this column, I obviolsy get the results for the number of occurrencies of each dictionary. But what I need is to get the number of occirrences of the separate values. a column cells look something like this:

                      col1

1   [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The 
   Smiths'}]

2   [{'name': 'Mark'}, {'name': 'Barbara}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}]

So basically what I need as result is how many Susans, Johns etc. there are in the entire columns

Any help will be appreciated

CodePudding user response:

You can try this, using @jch setup:

df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })


pd.DataFrame(df['col1'].to_list()).stack().str['name'].value_counts()

Output:

John          2
Mark          2
Susan         1
Mr.Bean       1
The Smiths    1
Barbara       1
Poly          1
Nick          1
dtype: int64

Let's use pandas DataFrame constructor, stack to reshape to single column, then using the selector from .str accessor to get the values from dictionaries and lastly value_counts.

CodePudding user response:

The data is actually a list of dictionaries on each line. You can build a dataframe from each row. Then the names are contained in a column which can be converted to a list, exploded and then perform a value_counts on that:

df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })

print(df)

Output:

                                                col1
0  [{'name': 'John'}, {'name': 'Mark'}, {'name': ...
1  [{'name': 'Mark'}, {'name': 'Barbara'}, {'name...

value_count :

df.apply(lambda x: pd.DataFrame(x['col1']).squeeze().to_list(), axis=1).explode().value_counts()

Output :

John          2
Mark          2
Susan         1
Mr.Bean       1
The Smiths    1
Barbara       1
Poly          1
Nick          1

CodePudding user response:

We can use explode() function to transform each element of a list-like to a row, replicating the index values. Then we can use json_normalize to convert each key in the dictionary to transform it to a column. Then value_counts() can be used to count each unique value in the dataFrame.

df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
                             [{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })
print(pd.json_normalize(df.col1.explode()).value_counts())

Result :

name      
John          2
Mark          2
Barbara       1
Mr.Bean       1
Nick          1
Poly          1
Susan         1
The Smiths    1

If you want to get the count of any one name, say John

name = 'John'
print(pd.json_normalize(df.col1.explode()).eq(name).sum())

Result :

2
  • Related