I'm trying to count occurrence of each string in a list for each row.
---- ---------------------------
| Id | Col1 |
---- ---------------------------
| N1 | ['a', 'b', 'c', 'a'] |
| N2 | ['b', 'b', 'b'] |
| N3 | [] |
| N4 | ['a', 'b', 'c', 'a', 'c'] |
| N5 | [] |
---- ---------------------------
As a result I want to get something like this:
---- --------------------------- -----------------------
| Id | Col1 | Col2 |
---- --------------------------- -----------------------
| N1 | ['a', 'b', 'c', 'a'] | {'a':2, 'b':1, 'c':1} |
| N2 | ['b', 'b', 'b'] | {'b':3} |
| N3 | [] | {} or None |
| N4 | ['a', 'b', 'c', 'a', 'c'] | {'a':2, 'b':1, 'c':2} |
| N5 | [] | {} or None |
---- --------------------------- -----------------------
Already tried to use Counter from collections library inside DataFrame by different methods, but nothing seems to work.
d = {'Id': ['N1', 'N2', 'N3', 'N4', 'N5'],
'Col1': [['a', 'b', 'c', 'a'], ['b', 'b', 'b'], [], ['a', 'b', 'c', 'a', 'c'], []]}
df = pd.DataFrame(data = d)
CodePudding user response:
very simple:
from collections import Counter
df['col_2'] = df.Col1.map(Counter)
>>> df
'''
Id Col1 col_2
0 N1 [a, b, c, a] {'a': 2, 'b': 1, 'c': 1}
1 N2 [b, b, b] {'b': 3}
2 N3 [] {}
3 N4 [a, b, c, a, c] {'a': 2, 'b': 1, 'c': 2}
4 N5 [] {}
CodePudding user response:
Check Below code using Counter:
import pandas as pd
from collections import Counter
df['Col2'] = df.apply(lambda x: Counter(x['Col1']) ,axis=1)
df
Output:
CodePudding user response:
What about the one-liner:
df['Col2] = df['Col1'].apply(lambda x: pd.Series(x).value_counts().to_dict())
O/P:
Id Col1 Col2
0 N1 [a, b, c, a] {'a': 2, 'b': 1, 'c': 1}
1 N2 [b, b, b] {'b': 3}
2 N3 [] {}
3 N4 [a, b, c, a, c] {'a': 2, 'c': 2, 'b': 1}
4 N5 [] {}
CodePudding user response:
from numpy import nan
import pandas as pd
Solution:
col = []
for i, row in df.iterrows():
col.append(
{elem : row['Col1'].count(elem) for elem in set(row['Col1'])} # set removes duplicates
)
df = df.join(pd.Series(col, name='Col2'))
Alternative solution:
def add_value_counts(col):
col['Col2'] = pd.Series(col['Col1']).value_counts().to_dict()
return col
df = df.T.apply(add_value_counts, axis=0).T # transposes df and iterates over rows