Flatten the result of SQL GROUP BY and create new columns from unique values-CodePudding

I'm pretty new to pandas and python and having some issues on something that I feel might be more straightforward than it feels.

I'm using pandas to create a dataframe off a SQL query. This query contains joins and group bys. The result looks as follows after running pd.read_sql:

id	region	type	question_code	question_score
1	DA	B	FA1	1
1	DA	B	FA2	3
1	DA	B	FA3	6
2	SA	X	FA1	2
2	SA	X	FA2	5
2	SA	X	FA3	3
3	PL	M	FA1	2
3	PL	M	FA2	4
3	PL	M	FA3	5

Every row has some common attributes, like id, region or type (repeated for on every row for every distinct id).

What I'm trying to achieve is to "flatten" these rows on their id and create new columns from question_code with the associated value as the question_score as follows:

id	region	type	FA1	FA2	FA3
1	DA	B	1	3	6
2	SA	X	2	5	3
3	PL	M	2	4	5

Is this possible to achieve using pandas?

CodePudding user response：

add the identifying columns into the pandas index
then unstack() the question_code
full code and output below

import pandas as pd
import io

df = pd.read_csv(io.StringIO("""id,region,type,question_code,question_score
1,DA,B,FA1,1
1,DA,B,FA2,3
1,DA,B,FA3,6
2,SA,X,FA1,2
2,SA,X,FA2,5
2,SA,X,FA3,3
3,PL,M,FA1,2
3,PL,M,FA2,4
3,PL,M,FA3,5"""))

df.set_index(["id","region","type","question_code"]).unstack("question_code").droplevel(0,1).reset_index()

	id	region	type	FA1	FA2	FA3
0	1	DA	B	1	3	6
1	2	SA	X	2	5	3
2	3	PL	M	2	4	5

id	region	type	question_code	question_score
1	DA	B	FA1	1
1	DA	B	FA2	3
1	DA	B	FA3	6
2	SA	X	FA1	2
2	SA	X	FA2	5
2	SA	X	FA3	3
3	PL	M	FA1	2
3	PL	M	FA2	4
3	PL	M	FA3	5

id	region	type	question_code	question_score
1	DA	B	FA1	1
1	DA	B	FA2	3
1	DA	B	FA3	6
2	SA	X	FA1	2
2	SA	X	FA2	5
2	SA	X	FA3	3
3	PL	M	FA1	2
3	PL	M	FA2	4
3	PL	M	FA3	5

id	region	type	question_code	question_score
1	DA	B	FA1	1
1	DA	B	FA2	3
1	DA	B	FA3	6
2	SA	X	FA1	2
2	SA	X	FA2	5
2	SA	X	FA3	3
3	PL	M	FA1	2
3	PL	M	FA2	4
3	PL	M	FA3	5