How can to count the number of occurrences of comma-separated values from the whole list of columns
data frame is like this:
id column
1
2 1
3 1
4 1,2
5 1,2
6 1,2,4
7 1,2,4
8 1,2,4,6
9 1,2,4,6
10 1,2,4,6,8
11 1,2,4,6,8
Desired output is:
id column count
1. 10
2 1. 7
3 1. 0
4 1,2. 6
5 1,2. 0
6 1,2,4. 4
7 1,2,4. 0
8 1,2,4,6. 2
9 1,2,4,6. 0
10 1,2,4,6,8 0
11 1,2,4,6,8 0
Tried this:
df = pd.read_csv('parentsplit/parentlist.csv')
df['count'] = df['parent_list'].str.split(',', expand=True).stack().value_counts()
its not working.
CodePudding user response:
You can do as follows:
df['count'] = df['id'].apply(lambda x: df['column'].fillna('X').str.contains(str(x)).sum())
This is basically counting the number of occurence of each id
in the column.
Output:
id column count
0 1 None 10
1 2 1 8
2 3 1 0
3 4 1,2 6
4 5 1,2 0
5 6 1,2,4 4
6 7 1,2,4 0
7 8 1,2,4,6 2
8 9 1,2,4,6 0
9 10 1,2,4,6,8 0
10 11 1,2,4,6,8 0
CodePudding user response:
Split and explode the column, then count
the occurrences using value_counts
then map the counts onto id
column
s = df['column'].str.split(',').explode().value_counts()
df['count'] = df['id'].astype(str).map(s).fillna(0)
id column count
0 1 None 10.0
1 2 1 8.0
2 3 1 0.0
3 4 1,2 6.0
4 5 1,2 0.0
5 6 1,2,4 4.0
6 7 1,2,4 0.0
7 8 1,2,4,6 2.0
8 9 1,2,4,6 0.0
9 10 1,2,4,6,8 0.0
10 11 1,2,4,6,8 0.0