Home > Software design >  Count occurrences of elements in Python list inside panadas dataframe rows
Count occurrences of elements in Python list inside panadas dataframe rows

Time:08-05

I'm trying to count occurrence of each string in a list for each row.

 ---- --------------------------- 
| Id |           Col1            |
 ---- --------------------------- 
| N1 | ['a', 'b', 'c', 'a']      |
| N2 | ['b', 'b', 'b']           |
| N3 | []                        |
| N4 | ['a', 'b', 'c', 'a', 'c'] | 
| N5 | []                        |
 ---- --------------------------- 

As a result I want to get something like this:

 ---- --------------------------- ----------------------- 
| Id |           Col1            |         Col2          |
 ---- --------------------------- ----------------------- 
| N1 | ['a', 'b', 'c', 'a']      | {'a':2, 'b':1, 'c':1} |
| N2 | ['b', 'b', 'b']           | {'b':3}               |
| N3 | []                        | {} or None            |
| N4 | ['a', 'b', 'c', 'a', 'c'] | {'a':2, 'b':1, 'c':2} |
| N5 | []                        | {} or None            |
 ---- --------------------------- ----------------------- 

Already tried to use Counter from collections library inside DataFrame by different methods, but nothing seems to work.

d = {'Id': ['N1', 'N2', 'N3', 'N4', 'N5'], 
     'Col1': [['a', 'b', 'c', 'a'], ['b', 'b', 'b'], [], ['a', 'b', 'c', 'a', 'c'], []]}
df = pd.DataFrame(data = d)

CodePudding user response:

very simple:

from collections import Counter

df['col_2'] = df.Col1.map(Counter)

>>> df
'''
   Id             Col1                     col_2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'b': 1, 'c': 2}
4  N5               []                        {}

CodePudding user response:

Check Below code using Counter:

import pandas as pd 

from collections import Counter

df['Col2'] = df.apply(lambda x: Counter(x['Col1']) ,axis=1)

df

Output:

enter image description here

CodePudding user response:

What about the one-liner:

df['Col2] = df['Col1'].apply(lambda x: pd.Series(x).value_counts().to_dict())

O/P:

   Id             Col1                      Col2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'c': 2, 'b': 1}
4  N5               []                        {}
​

CodePudding user response:

from numpy import nan
import pandas as pd

Solution:

col = []
for i, row in df.iterrows():
    col.append(
        {elem : row['Col1'].count(elem) for elem in set(row['Col1'])} # set removes duplicates
    )

df = df.join(pd.Series(col, name='Col2'))

Alternative solution:

def add_value_counts(col):
    col['Col2'] = pd.Series(col['Col1']).value_counts().to_dict()
    return col

df = df.T.apply(add_value_counts, axis=0).T # transposes df and iterates over rows
  • Related