Home > Net >  Split and stack a multi-value column
Split and stack a multi-value column

Time:03-29

I am trying to analyze the results from a survey I administered. Each question had checkboxes of possible responses, and users could select as many or as few as they liked.

Data comes in looking like this

df = pd.DataFrame({
    'Result': ['Blue, Yellow, Red',
                'Red',
                'Green, Blue']
})

Result
Blue, Yellow, Red
Red
Green, Blue

Ultimately, I want to know how many times each choice (color) was selected, but using value_counts() at this stage tells me the number of times each exact choice was made (returns 1,1,1, in this example).

I think I need to make it look like this:

Result
Blue
Yellow
Red
Red
Green
Blue

So I need to split each index on a comma, using str.split(',') but I can't guarantee how many choices will have been selected.

Everything I've found so far deals with splitting one column into multiple columns, but I want to split one column and stack the results into one column.

CodePudding user response:

You can try something like this:

df.Result.str.split(', ', expand = True).stack().reset_index(drop = True).value_counts()

Blue      2
Red       2
Yellow    1
Green     1
dtype: int64

CodePudding user response:

You could also use str.split explode value_counts:

out = df['Result'].str.split(', ').explode().value_counts()

Output:

Blue      2
Red       2
Yellow    1
Green     1
Name: Result, dtype: int64
  • Related