I am using azure databricks, getting differents excel forms storaged in a blob. I need to keep 3 columns as it is and group as a list other multiples (and differents for each form) responses columns.
My main goal here is to transforme those diferents columns in one unique with a object that the keys are the title of the questions and the value is the response.
I have the following dataframe:
id | name | question_1 | question_2 | question_3 | |
---|---|---|---|---|---|
1 | mark | [email protected] | response_11 | response_21 | response_31 |
3 | elon | [email protected] | response_12 | response_22 | response_32 |
I would like to have the following output.
id | name | responses | |
---|---|---|---|
1 | mark | [email protected] | {'question1':'response'11','question2':'response21','question3':'response_31'} |
2 | elon | [email protected] | {'question1':'response'12','question2':'response22','question3':'response_32'} |
3 | zion | [email protected] | {'question1':'response'13','question2':'response23','question3':'response_33'} |
How i could get that using pandas? i already did the following:
baseCols = ['id','name','email']
def getFormsColumnsName(df):
df_response_columns = df.columns.values.tolist()
for deleted_column in cols:
df_response_columns.remove(deleted_column)
return df_response_columns
formColumns = getFormsColumnsName(df)
df = df.astype(str)
df['responses'] = df[formColumns].values.tolist()
display(df)
But this give me that strange list of responses:
id | name | responses | |
---|---|---|---|
1 | mark | mark@email | 0: "response11"1: "response12"2: "response13"3: "['response11', 'response12', 'response13' "[]"]" |
i dont know what i should do to get what i expected.
Thank you in advance!
CodePudding user response:
You can get your responses
column by using pd.DataFrame.to_dict("records")
.
questions = df.filter(like="question")
responses = questions.to_dict("records")
out = df.drop(questions, axis=1).assign(responses=responses)
output:
id name email responses
0 1 mark [email protected] {'question_1': 'response_11', 'question_2': 'response_21', 'question_3': 'response_31'}
1 3 elon [email protected] {'question_1': 'response_12', 'question_2': 'response_22', 'question_3': 'response_32'}