Home > Software engineering >  Is there a way to count the number of occurrences of values in a given column?
Is there a way to count the number of occurrences of values in a given column?

Time:06-24

I have a df column where each cell contais a dictionary, so when I apply value_counts to this column, I obviolsy get the results for the number of occurrencies of each dictionary. But what I need is to get the number of occirrences of the separate values. a column cells look something like this:

                      col1

1   [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The 
   Smiths'}]

2   [{'name': 'Mark'}, {'name': 'Barbara}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}]

So basically what I need as result is how many Susans, Johns etc. there are in the entire columns

Any help will be appreciated

CodePudding user response:

The data is actually a list of dictionaries on each line. You can build a dataframe from each row. Then the names are contained in a column which can be converted to a list, exploded and then perform a value_counts on that:

df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })

print(df)

                                                col1
0  [{'name': 'John'}, {'name': 'Mark'}, {'name': ...
1  [{'name': 'Mark'}, {'name': 'Barbara'}, {'name...

df.apply(lambda x: pd.DataFrame(x['col1']).squeeze().to_list(), axis=1).explode().value_counts()

John          2
Mark          2
Susan         1
Mr.Bean       1
The Smiths    1
Barbara       1
Poly          1
Nick          1

CodePudding user response:

You can try this, using @jch setup:

df = pd.DataFrame({'col1': [ [{'name': 'John'}, {'name': 'Mark'}, {'name': 'Susan'}, {'name': 'Mr.Bean'}, {'name': 'The Smiths'}], \
[{'name': 'Mark'}, {'name': 'Barbara'}, {'name': 'Poly'}, {'name': 'John'}, {'name': 'Nick'}] ] })


pd.DataFrame(df['col1'].to_list()).stack().str['name'].value_counts()

Output:

John          2
Mark          2
Susan         1
Mr.Bean       1
The Smiths    1
Barbara       1
Poly          1
Nick          1
dtype: int64

Let's use pandas DataFrame constructor, stack to reshape to single column, then using the selector from .str accessor to get the values from dictionaries and lastly value_counts.

  • Related