I have the dataframe below:
details = {
'container_id' : [1, 2, 3, 4, 5, 6 ],
'container' : ['black box', 'orange box', 'blue box', 'black box','blue box', 'white box'],
'fruits' : ['apples, black currant', 'oranges','peaches, oranges', 'apples','apples, peaches, oranges', 'black berries, peaches, oranges, apples'],
}
# creating a Dataframe object
df = pd.DataFrame(details)
I want to find the frequency of each fruit separately on a list.
I tried this code
df['fruits'].str.split(expand=True).stack().value_counts()
but I get the black count 2 times instead of 1 for black currant and 1 for black berries.
CodePudding user response:
You can do it like you did, but with specifying the delimiter. Be aware that when splitting the data, you get some leading whitespace unless your delimiter is a comma with a space. To be sure just use another step with str.strip
.
df['fruits'].str.split(',', expand=False).explode().str.strip().value_counts()
your way (you can also use str.strip after the stack command if you want to)
df['fruits'].str.split(', ', expand=True).stack().value_counts()
Output:
apples 4
oranges 4
peaches 3
black currant 1
black berries 1
Name: fruits, dtype: int64
CodePudding user response:
Specify the comma separator followed by an optional space:
df['fruits'].str.split(',\s?', expand=True).stack().value_counts()
OUTPUT:
apples 4
oranges 4
peaches 3
black currant 1
black berries 1
dtype: int64