who to get name of column from dataframe automatically-CodePudding

I have a CSV sheet with a lot of columns (column names are long questions). I am writing a code for value.counts() of each question (Number of "yes" and "no") in each column. My code is:

import pandas as pd
df = pd.read_csv("fixed_site.csv", encoding= 'unicode_escape')
q1 = df['Any shortage of supplies? (Vaccines/Syringes)'].value_counts()

You can see the column name i.e "Any shortage of supplies? (Vaccines/Syringes)". There are many other very long questions as compared to this.

I have two questions. 1. How to avoid writing long questions manually. 2. After doing the counts() function, I want to create a CSV in which in the first column there will be "Question" and next two columns there will be values of "Yes" and "No". like below

q1_name = "Any shortage of supplies? (Vaccines/Syringes)"    
q1_analysis = [q1_name, q1["Yes"], q34["No"]]
fixedsite_analysis = pd.DataFrame(data=[total_visits], columns=["QUESTION","YES", "NO"])
fixedsite_analysis.to_csv("fixedsite_analysis.csv", index = False)

How to do it simply with less code (not copying or writing the name of every column). Thank you for your help

CodePudding user response：

Suppose the following dataframe:

df = pd.DataFrame({'Q1': list('YYYN'), 'Q2': list('NNYY'), 'Q3': list('YNNN')})
print(df)

# Output
  Q1 Q2 Q3
0  Y  N  Y
1  Y  N  N
2  Y  Y  N
3  N  Y  N

Use melt and pivot_table to reshape your dataframe:

out = (df.melt(var_name='question', value_name='answer').assign(dummy=1) 
         .pivot_table('dummy', 'question', 'answer', aggfunc='count') 
         .rename_axis(columns=None).reset_index())
print(out)

# Output
  question  N  Y
0       Q1  1  3
1       Q2  2  2
2       Q3  3  1

The dummy variable is set to allow pivot_table to count value 'Yes' or 'No'.

CodePudding user response：

for column in df.columns:
    yes, no = df[column].value_counts()['YES'], df[column].value_counts()['NO']
    result = pd.DataFrame({'question': column, 
          'YES': yes, 
          'NO': no}, 
                      index=[0])
    result.to_csv(f'{column}.csv', index=False)