Home > Net >  How to create new columns from a column value in pandas based on IDs
How to create new columns from a column value in pandas based on IDs

Time:03-08

I want to create new columns based on a given column value and merge the rows with same subject names within a given date, for example for a given pandas dataframe

subject questionid answer datetime
First q1 fa1 2021-02-27T
First q2 fa2 2021-02-27T
First q3 fa3 2021-02-27T
First q4 fa4 2021-02-28T
Second q1 sa1 2021-02-27T
Second q2 2021-02-27T
Second q3 sa3 2021-02-27T
Second q4 2021-02-27T

I want my final dataframe to contain columns based on questionid where each row contains data in each date range i.e. within 24 hours of datetime for a single subject. **

So, the final output should look like this,

**

subject q1 q2 q3 q4 datetime
First fa1 fa2 fa3 nan 2021-02-27T
First nan nan nan fa4 2021-02-28T
Second sa1 nan sa3 nan 2021-02-27T

Notice that the first subject has two rows because of different dates the data were collected on.

I was able to create columns based on the questionid in a new dataframe by using

df.pivot(columns='questionid', values='answer')

But the final dataframe looks like this,

subject questionid answer datetime q1 q2 q3 q4
First q1 fa1 2021-02-27T fa1 nan nan nan
First q2 fa2 2021-02-27T nan fa2 nan nan
First q3 fa3 2021-02-27T nan nan fa3 nan
First q4 fa4 2021-02-28T nan nan nan fa4
Second q1 sa1 2021-02-27T sa1 nan nan nan
Second q2 2021-02-27T nan nan nan nan
Second q3 sa3 2021-02-27T nan nan sa3 nan
Second q4 2021-02-27T nan nan nan nan

Is there an easier way without looping through the rows to create the desired output. Again

CodePudding user response:

You're almost there, use both ['subject', 'datetime'] as index in pivot:

(df.pivot(index=['subject', 'datetime'], columns='questionid', values='answer')
   .reset_index()
)

output:

  subject     datetime   q1   q2   q3   q4
0   First  2021-02-27T  fa1  fa2  fa3  NaN
1   First  2021-02-28T  NaN  NaN  NaN  fa4
2  Second  2021-02-27T  sa1  NaN  sa3  NaN

CodePudding user response:

I think that should do it:

pd.pivot_table(index=['subject', 'datetime'], values='answer', columns='questionid')
  • Related