Let's stay I have a grocery list with one column titled "Groceries". In each row there is a list of strings, for example.
Groceries |
---|
apples, bananas, oranges |
apples, bananas, bananas, pears |
oranges, pears, bananas |
Is there a way to count each string and add a "tally" is a new dataframe or similar thing with the appropriately labeled item? The dataframe would then look like:
apples | oranges | bananas | pears |
---|---|---|---|
1 | 1 | 1 | 0 |
1 | 0 | 2 | 1 |
1 | 1 | 0 | 1 |
I can't find a function that will recognize strings and count them in the appropriate row/column with the string name. I am also pretty new to Python and am not sure what would go into creating a function that would do this.
CodePudding user response:
You can split
the string on commas, explode
to multiple rows, get_dummies
to transform to 0/1, and groupby.sum
to aggregate:
out = (pd
.get_dummies(df['Groceries'].str.split(',\s*').explode())
.groupby(level=0).sum()
)
Or similar with crosstab
:
s = df['Groceries'].str.split(',\s*').explode()
out = pd.crosstab(s.index, s)
output:
apples bananas oranges pears
0 1 1 1 0
1 1 2 0 1
2 0 1 1 1