Find minimum respondent's age of each question-CodePudding

given df a dataframe containing the answers to a questionnaire

import pandas as pd
import numpy as np
df = pd.DataFrame(data=[[10,np.nan,'Answer 1','Answer 2'],
                        [40,"Answer 4","Answer 3",'Answer 2'],
                        [20,"Answer 2", np.nan,'Answer 1']], 
                        columns = ['Age','Question 1','Question 2', 'Question 3'])


df

    Age Question 1  Question 2  Question 3
0   10  NaN         Answer 1    Answer 2
1   40  Answer 4    Answer 3    Answer 2
2   20  Answer 2    NaN         Answer 1

I would like to create a second dataframe containing for each question which was the minimum age of the respondent

Disclaimer: very ugly code ahead!

df2 = pd.DataFrame(data=df.columns.tolist(),columns=["Questions"])
for col in df2["Questions"]:
    if col != "Age":
        df2.loc[df2["Questions"]==col,"min_age"] = df.loc[:,["Age",col]].dropna()["Age"].min()

df2

    Question    min_age
0   Age         NaN
1   Question 1  20.0
2   Question 2  10.0
3   Question 3  10.0

CodePudding user response：

You can find the index of the first non-NAN value with Age as ordered index:

df2 = df.set_index('Age').sort_index().notna()
out = df2.idxmax().where(df2.any())

output:

Question 1    20
Question 2    10
Question 3    10
dtype: int64

CodePudding user response：

Another possible solution, based on pandas.melt and pandas.pivot_table:

(df.melt(id_vars='Age').dropna()
 .pivot_table(index='variable', values='Age', aggfunc='min')
 .reset_index(names='Question'))

Output:

     Question  Age
0  Question 1   20
1  Question 2   10
2  Question 3   10

CodePudding user response：

This could be another solution:

(pd.melt(df, id_vars='Age', value_vars=df.columns[1:])
 .dropna()
 .groupby('variable')
 .agg({'Age' : np.min}))

            Age
variable       
Question 1   20
Question 2   10
Question 3   10