I am trying to analyze the results from a survey I administered. Each question had checkboxes of possible responses, and users could select as many or as few as they liked.
Data comes in looking like this
df = pd.DataFrame({
'Result': ['Blue, Yellow, Red',
'Red',
'Green, Blue']
})
Result |
---|
Blue, Yellow, Red |
Red |
Green, Blue |
Ultimately, I want to know how many times each choice (color) was selected, but using value_counts()
at this stage tells me the number of times each exact choice was made (returns 1,1,1, in this example).
I think I need to make it look like this:
Result |
---|
Blue |
Yellow |
Red |
Red |
Green |
Blue |
So I need to split each index on a comma, using str.split(',')
but I can't guarantee how many choices will have been selected.
Everything I've found so far deals with splitting one column into multiple columns, but I want to split one column and stack the results into one column.
CodePudding user response:
You can try something like this:
df.Result.str.split(', ', expand = True).stack().reset_index(drop = True).value_counts()
Blue 2
Red 2
Yellow 1
Green 1
dtype: int64
CodePudding user response:
You could also use str.split
explode
value_counts
:
out = df['Result'].str.split(', ').explode().value_counts()
Output:
Blue 2
Red 2
Yellow 1
Green 1
Name: Result, dtype: int64