Home > Back-end >  Is there a Python function for counting the number of strings in a cell and reporting these in a new
Is there a Python function for counting the number of strings in a cell and reporting these in a new

Time:09-21

Let's stay I have a grocery list with one column titled "Groceries". In each row there is a list of strings, for example.

Groceries
apples, bananas, oranges
apples, bananas, bananas, pears
oranges, pears, bananas

Is there a way to count each string and add a "tally" is a new dataframe or similar thing with the appropriately labeled item? The dataframe would then look like:

apples oranges bananas pears
1 1 1 0
1 0 2 1
1 1 0 1

I can't find a function that will recognize strings and count them in the appropriate row/column with the string name. I am also pretty new to Python and am not sure what would go into creating a function that would do this.

CodePudding user response:

You can split the string on commas, explode to multiple rows, get_dummies to transform to 0/1, and groupby.sum to aggregate:

out = (pd
 .get_dummies(df['Groceries'].str.split(',\s*').explode())
 .groupby(level=0).sum()
)

Or similar with crosstab:

s = df['Groceries'].str.split(',\s*').explode()
out = pd.crosstab(s.index, s)

output:

   apples  bananas  oranges  pears
0       1        1        1      0
1       1        2        0      1
2       0        1        1      1
  • Related