I'm pretty new to pandas and python and having some issues on something that I feel might be more straightforward than it feels.
I'm using pandas to create a dataframe off a SQL query. This query contains joins and group bys. The result looks as follows after running pd.read_sql
:
id | region | type | question_code | question_score |
---|---|---|---|---|
1 | DA | B | FA1 | 1 |
1 | DA | B | FA2 | 3 |
1 | DA | B | FA3 | 6 |
2 | SA | X | FA1 | 2 |
2 | SA | X | FA2 | 5 |
2 | SA | X | FA3 | 3 |
3 | PL | M | FA1 | 2 |
3 | PL | M | FA2 | 4 |
3 | PL | M | FA3 | 5 |
Every row has some common attributes, like id
, region
or type
(repeated for on every row for every distinct id
).
What I'm trying to achieve is to "flatten" these rows on their id
and create new columns from question_code
with the associated value as the question_score
as follows:
id | region | type | FA1 | FA2 | FA3 |
---|---|---|---|---|---|
1 | DA | B | 1 | 3 | 6 |
2 | SA | X | 2 | 5 | 3 |
3 | PL | M | 2 | 4 | 5 |
Is this possible to achieve using pandas?
CodePudding user response:
- add the identifying columns into the pandas index
- then
unstack()
the question_code - full code and output below
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""id,region,type,question_code,question_score
1,DA,B,FA1,1
1,DA,B,FA2,3
1,DA,B,FA3,6
2,SA,X,FA1,2
2,SA,X,FA2,5
2,SA,X,FA3,3
3,PL,M,FA1,2
3,PL,M,FA2,4
3,PL,M,FA3,5"""))
df.set_index(["id","region","type","question_code"]).unstack("question_code").droplevel(0,1).reset_index()
id | region | type | FA1 | FA2 | FA3 | |
---|---|---|---|---|---|---|
0 | 1 | DA | B | 1 | 3 | 6 |
1 | 2 | SA | X | 2 | 5 | 3 |
2 | 3 | PL | M | 2 | 4 | 5 |