Home > OS >  Flatten the result of SQL GROUP BY and create new columns from unique values
Flatten the result of SQL GROUP BY and create new columns from unique values

Time:03-07

I'm pretty new to pandas and python and having some issues on something that I feel might be more straightforward than it feels.

I'm using pandas to create a dataframe off a SQL query. This query contains joins and group bys. The result looks as follows after running pd.read_sql:

id region type question_code question_score
1 DA B FA1 1
1 DA B FA2 3
1 DA B FA3 6
2 SA X FA1 2
2 SA X FA2 5
2 SA X FA3 3
3 PL M FA1 2
3 PL M FA2 4
3 PL M FA3 5

Every row has some common attributes, like id, region or type (repeated for on every row for every distinct id).

What I'm trying to achieve is to "flatten" these rows on their id and create new columns from question_code with the associated value as the question_score as follows:

id region type FA1 FA2 FA3
1 DA B 1 3 6
2 SA X 2 5 3
3 PL M 2 4 5

Is this possible to achieve using pandas?

CodePudding user response:

  • add the identifying columns into the pandas index
  • then unstack() the question_code
  • full code and output below
import pandas as pd
import io

df = pd.read_csv(io.StringIO("""id,region,type,question_code,question_score
1,DA,B,FA1,1
1,DA,B,FA2,3
1,DA,B,FA3,6
2,SA,X,FA1,2
2,SA,X,FA2,5
2,SA,X,FA3,3
3,PL,M,FA1,2
3,PL,M,FA2,4
3,PL,M,FA3,5"""))

df.set_index(["id","region","type","question_code"]).unstack("question_code").droplevel(0,1).reset_index()
id region type FA1 FA2 FA3
0 1 DA B 1 3 6
1 2 SA X 2 5 3
2 3 PL M 2 4 5
  • Related