Home > Net >  How to unpivot pandas dataframe
How to unpivot pandas dataframe

Time:11-06

I have a Pandas dataframe which looks as follows: Starting Table:

Kode Country Procedure male male female female
Kode Country Procedure two-year-old three-year-old two-year-old three-year-old
1a US proc_1 4 6 3 6
1a UK proc_1 2 3 5 1
1b US proc_2 15 3 5 2
1b UK proc_2 8 4 7 3

CSV:

Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3

My python code:

df = pd.read_csv('test.csv', 
                 header=[0,1],delimiter= ';',
                 engine='python', 
                 encoding= 'unicode_escape')

In fact I have two row headers How to pivot pandas dataframe to get the following result?

Resulting Table:

Code Country Procedure Gender Age Frequency
1a US proc_1 male two-year-old 4
1a US proc_1 male three-year-old 6
1a US proc_1 female two-year-old 3
1a US proc_1 female three-year-old 6
1a UK proc_1 male two-year-old 2
1a UK proc_1 male three-year-old 3
1a UK proc_1 female two-year-old 5
1a UK proc_1 female three-year-old 1
1b ...

CodePudding user response:

You'll likely want to go with melt which is sort of the opposite of pivot.

  • We specify the identifer variables: the first three columns
  • The rest of the columns become value variables and we assign them names, and a name for the new value column
import io
import pandas as pd

piv = pd.read_csv(io.StringIO("""Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3"""), header=[0, 1], delimiter=";")


unpivoted = piv.melt(id_vars=list(piv.columns[:3]),
                     var_name=['gender', 'age'],
                     value_name='frequency')

# cleanup the tupleized columns
unpivoted.columns = [(col[0] if isinstance(col, tuple) else col)
                     for col in unpivoted.columns]

CodePudding user response:

Using "melt" with "pivot_table", you can try the following:

df_melt=df.melt(id_vars=['Kode','Country','Procedure'],var_name='Gender')

Extract two-year-old, three-year-old into its own column "Age_group", and remove non-numeric rows from the "value" column"

df_melt['Age_group']=df_melt['value'].str.extract('(.*-year-old)')
df_melt['Age_group']=df_melt['Age_group'].ffill()
df_melt=df_melt[pd.to_numeric(df_melt['value'],errors='coerce').notnull()]

If the original data frame is red into an editor as is, the columns named the same would be renamed as male, male.1, female, female.1, etc. To remove characters after "." from the "Gender" column

df_melt['Gender']=df_melt['Gender'].map(lambda x: x[0: x.find('.')] if '.' in x else x)

Pivot result

result=df_melt.pivot_table(index=['Kode','Country','Procedure','Gender','Age_group'],values='value')
  • Related