I have a pandas.DataFrame that looks like that:
index | projectid | question | answer |
---|---|---|---|
0 | 1 | 'q1' | 'str1' |
1 | 1 | 'q2' | 'str2' |
2 | 1 | 'q3' | 'str3' |
3 | 2 | 'q1' | 'str4' |
4 | 2 | 'q3' | 'str6' |
And I would like to format it like that:
index | projectid | question1 | answer1 | question2 | answer2 | question3 | answer3 |
---|---|---|---|---|---|---|---|
0 | 1 | 'q1' | 'str1' | 'q2' | 'str2' | 'q3' | 'str3' |
1 | 2 | 'q1' | 'str4' | None | None | 'q3' | 'str6' |
Not every project has the same number of question but questions are shared for each project. So when a specific question isn't in a project, I would like cells to be filled up with None values.
I didn't found any way to do it with join or concat, but I don't know how to properly use it.
I would like to improve my pandas skills so my question is:
Is there any way to do it with pandas treatment or doing it manually by treating my DataFrames with iterrows is the only way ?
Thank you !
CodePudding user response:
You can use cumcount
before pivoting to get your suffixes:
df['idx'] = df.groupby('projectid').cumcount() 1
df = df.pivot(index='projectid',columns='idx')[['question','answer']]
df.columns = [''.join(map(str, col)) for col in df.columns]
print(df)
Output::
question1 question2 question3 answer1 answer2 answer3
projectid
1 'q1' 'q2' 'q3' 'str1' 'str2' 'str3'
2 'q1' 'q3' NaN 'str4' 'str6' NaN