Home > database >  Convert Multiples Columns to a List inside a single column in pandas
Convert Multiples Columns to a List inside a single column in pandas

Time:12-27

I am using azure databricks, getting differents excel forms storaged in a blob. I need to keep 3 columns as it is and group as a list other multiples (and differents for each form) responses columns.
My main goal here is to transforme those diferents columns in one unique with a object that the keys are the title of the questions and the value is the response. I have the following dataframe:

id name email question_1 question_2 question_3
1 mark [email protected] response_11 response_21 response_31
3 elon [email protected] response_12 response_22 response_32

I would like to have the following output.
id name email responses
1 mark [email protected] {'question1':'response'11','question2':'response21','question3':'response_31'}
2 elon [email protected] {'question1':'response'12','question2':'response22','question3':'response_32'}
3 zion [email protected] {'question1':'response'13','question2':'response23','question3':'response_33'}

How i could get that using pandas? i already did the following:
    baseCols = ['id','name','email']
    def getFormsColumnsName(df):
        df_response_columns = df.columns.values.tolist()
        for deleted_column in cols:
            df_response_columns.remove(deleted_column)
        return df_response_columns
    formColumns = getFormsColumnsName(df)
    df = df.astype(str)
    df['responses'] = df[formColumns].values.tolist()
    display(df)

But this give me that strange list of responses:
id name email responses
1 mark mark@email 0: "response11"1: "response12"2: "response13"3: "['response11', 'response12', 'response13' "[]"]"

i dont know what i should do to get what i expected.
Thank you in advance!

CodePudding user response:

You can get your responses column by using pd.DataFrame.to_dict("records").

questions = df.filter(like="question")
responses = questions.to_dict("records")
out = df.drop(questions, axis=1).assign(responses=responses)

output:

   id  name           email                                          responses
0   1  mark  [email protected]  {'question_1': 'response_11', 'question_2': 'response_21', 'question_3': 'response_31'}
1   3  elon  [email protected]  {'question_1': 'response_12', 'question_2': 'response_22', 'question_3': 'response_32'}
  • Related