I have a survey which contains several Check all that apply questions. I am using Tableau to do visualization but I failed to split the data into different columns due to the limitation of Tableau Desktop(Seems you need to use Tableau Prep). So I have been thinking about using python to get this done, the table is like this:
Q: Which would you like to know more about? (select all that apply)
id answer
001 A; B; C
002 A
003 B; C
004 A; C
So, I'd like to get the result like this:
Answer Count
A 3
B 2
C 3
Can anyone help?
CodePudding user response:
You can try split
the semicolon column to list then explode at last value_counts
out = (df.assign(answer=df['answer'].str.split('; '))
.explode('answer')
.value_counts('answer')
.to_frame('Count')
.reset_index()))
print(out)
answer Count
0 A 3
1 C 3
2 B 2
In case, explode
is too new for your pandas, you can try .apply(pd.Series).stack()
instead.
out = (df['answer'].str.split('; ')
.apply(pd.Series).stack()
.value_counts()
.to_frame('Count')
.reset_index()
.rename(columns={'index': 'Answer'}))
CodePudding user response:
You can split
, explode
and count the values:
print(df['answer'].str.split('; ').explode().value_counts().to_frame('Count').rename_axis('Answer'))
Count
Answer
A 3
C 3
B 2
Alternative using collections.Counter
(as explode
is not available in your version of Pandas):
from collections import Counter
res = pd.DataFrame.from_dict(Counter(df['answer'].str.split('; ').agg(sum)), orient='index', columns=['Count']).rename_axis('Answer')