I have a CSV sheet with a lot of columns (column names are long questions). I am writing a code for value.counts() of each question (Number of "yes" and "no") in each column. My code is:
import pandas as pd
df = pd.read_csv("fixed_site.csv", encoding= 'unicode_escape')
q1 = df['Any shortage of supplies? (Vaccines/Syringes)'].value_counts()
You can see the column name i.e "Any shortage of supplies? (Vaccines/Syringes)". There are many other very long questions as compared to this.
I have two questions. 1. How to avoid writing long questions manually. 2. After doing the counts() function, I want to create a CSV in which in the first column there will be "Question" and next two columns there will be values of "Yes" and "No". like below
q1_name = "Any shortage of supplies? (Vaccines/Syringes)"
q1_analysis = [q1_name, q1["Yes"], q34["No"]]
fixedsite_analysis = pd.DataFrame(data=[total_visits], columns=["QUESTION","YES", "NO"])
fixedsite_analysis.to_csv("fixedsite_analysis.csv", index = False)
How to do it simply with less code (not copying or writing the name of every column). Thank you for your help
CodePudding user response:
Suppose the following dataframe:
df = pd.DataFrame({'Q1': list('YYYN'), 'Q2': list('NNYY'), 'Q3': list('YNNN')})
print(df)
# Output
Q1 Q2 Q3
0 Y N Y
1 Y N N
2 Y Y N
3 N Y N
Use melt
and pivot_table
to reshape your dataframe:
out = (df.melt(var_name='question', value_name='answer').assign(dummy=1)
.pivot_table('dummy', 'question', 'answer', aggfunc='count')
.rename_axis(columns=None).reset_index())
print(out)
# Output
question N Y
0 Q1 1 3
1 Q2 2 2
2 Q3 3 1
The dummy
variable is set to allow pivot_table
to count value 'Yes' or 'No'.
CodePudding user response:
for column in df.columns:
yes, no = df[column].value_counts()['YES'], df[column].value_counts()['NO']
result = pd.DataFrame({'question': column,
'YES': yes,
'NO': no},
index=[0])
result.to_csv(f'{column}.csv', index=False)