I have the below dataframe that comes from a Microsoft forms, that I am downloading from a Sharepoint site using Python, and I need to save to our local database for further analysis.
Input DF:
ID Q1 Q2 QN
1 1 1 0
2 1 1 1
3 0 0 1
4 0 1 0
Note that there are more than 50 columns, and rows some hundreds for now, but are going to increase during time.
Before sending the data to SQL Server I'd like to transpose the dataframe so that it is normalized for database storing. The output I'd like to get is the following:
Output DF:
ID Question Answer
1 Q1 1
1 Q2 1
1 QN 0
2 Q1 1
2 Q2 1
2 QN 1
3 Q1 0
3 Q2 0
3 QN 1
4 Q1 0
4 Q2 1
4 QN 0
I know that by simply transposing columns to rows using df.T I won't obtain much; can anyone please address some possible approach? Even with plain Python lists/dictionaries or whatever, does not need to be necessarily handled with pandas. Thanks
CodePudding user response:
You can use melt
:
df_new = df.set_index("ID").melt(var_name="Question", value_name="Answer", ignore_index=False).sort_index()
output:
Question Answer
ID
1 Q1 1
1 Q2 1
1 QN 0
2 Q1 1
2 Q2 1
2 QN 1
3 Q1 0
3 Q2 0
3 QN 1
4 Q1 0
4 Q2 1
4 QN 0
CodePudding user response:
A slightly different way to @anon01's answer would be to use the id_var
parameter in df.melt
instead of setting index explicitly (documentation).
df.melt('ID', ['Q1','Q2','QN'], 'Question', 'Answer')
ID Question Answer
0 1 Q1 1
1 2 Q1 1
2 3 Q1 0
3 4 Q1 0
4 1 Q2 1
5 2 Q2 1
6 3 Q2 0
7 4 Q2 1
8 1 QN 0
9 2 QN 1
10 3 QN 1
11 4 QN 0