Home > Software design >  Count occurrences of elements in Python list inside panadas dataframe rows
Count occurrences of elements in Python list inside panadas dataframe rows


I'm trying to count occurrence of each string in a list for each row.

 ---- --------------------------- 
| Id |           Col1            |
 ---- --------------------------- 
| N1 | ['a', 'b', 'c', 'a']      |
| N2 | ['b', 'b', 'b']           |
| N3 | []                        |
| N4 | ['a', 'b', 'c', 'a', 'c'] | 
| N5 | []                        |
 ---- --------------------------- 

As a result I want to get something like this:

 ---- --------------------------- ----------------------- 
| Id |           Col1            |         Col2          |
 ---- --------------------------- ----------------------- 
| N1 | ['a', 'b', 'c', 'a']      | {'a':2, 'b':1, 'c':1} |
| N2 | ['b', 'b', 'b']           | {'b':3}               |
| N3 | []                        | {} or None            |
| N4 | ['a', 'b', 'c', 'a', 'c'] | {'a':2, 'b':1, 'c':2} |
| N5 | []                        | {} or None            |
 ---- --------------------------- ----------------------- 

Already tried to use Counter from collections library inside DataFrame by different methods, but nothing seems to work.

d = {'Id': ['N1', 'N2', 'N3', 'N4', 'N5'], 
     'Col1': [['a', 'b', 'c', 'a'], ['b', 'b', 'b'], [], ['a', 'b', 'c', 'a', 'c'], []]}
df = pd.DataFrame(data = d)

CodePudding user response:

very simple:

from collections import Counter

df['col_2'] = df.Col1.map(Counter)

>>> df
   Id             Col1                     col_2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'b': 1, 'c': 2}
4  N5               []                        {}

CodePudding user response:

Check Below code using Counter:

import pandas as pd 

from collections import Counter

df['Col2'] = df.apply(lambda x: Counter(x['Col1']) ,axis=1)



enter image description here

CodePudding user response:

What about the one-liner:

df['Col2] = df['Col1'].apply(lambda x: pd.Series(x).value_counts().to_dict())


   Id             Col1                      Col2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'c': 2, 'b': 1}
4  N5               []                        {}

CodePudding user response:

from numpy import nan
import pandas as pd


col = []
for i, row in df.iterrows():
        {elem : row['Col1'].count(elem) for elem in set(row['Col1'])} # set removes duplicates

df = df.join(pd.Series(col, name='Col2'))

Alternative solution:

def add_value_counts(col):
    col['Col2'] = pd.Series(col['Col1']).value_counts().to_dict()
    return col

df = df.T.apply(add_value_counts, axis=0).T # transposes df and iterates over rows
  • Related