Home > OS >  Transform dataframe based on columns and rows
Transform dataframe based on columns and rows

Time:12-16

I have the following dataframe

CustomerNr Target Source Percentage
1001 A C 0.2
1004 D np.nan 0.3
1005 C D 0.4
1010 A D 0.5
import numpy as np
df = pd.DataFrame([[1001, 'A','C',0.2], [1004, 'D',np.nan,0.3],[1005, 'C','D',0.4], 
                   [1010, 'A','D',0.5]], columns=['CustomerNr','Target','Source','Percentage'])

to this one (any ideas how to formulate the title for this problem by the way)

import numpy as np
df = pd.DataFrame([['1001 Target' , 'A',0.2],
                   ['1001 Source' , 'C',0.2], 
                   ['1004 Target', 'D',0.3],
                   ['1004 Source', np.nan,0.3],
                   ['1005 Target', 'C',0.4],
                   ['1005 Source', 'D',0.4],
                   ['10010 Target', 'A',0.5],
                   ['10010 Source', 'D',0.5],
                  ], columns=['CustomerNr Scope','Value','Percentage'])
CustomerNr Scope Value Percentage
1001 Target A 0.2
1001 Source C 0.2
1004 Target D 0.3
1004 Source NaN 0.3
1005 Target C 0.4
1005 Source D 0.4
10010 Target A 0.5
10010 Source D 0.5

CodePudding user response:

You could use pandas stack to achieve this:

(df.set_index(["CustomerNr", "Percentage"])
         .rename_axis("Scope", axis=1)
         .stack(dropna=False)
         .rename("Value")
         .reset_index()
         .assign(CustomerNrScope=lambda df: df[["CustomerNr", "Scope"]].astype(str).apply(" ".join, axis=1)))

Or concatenate the tables for source and target:

df_new = pd.concat([df[["CustomerNr", tscol, "Percentage"]]
               .rename(columns={tscol: "Value"})
               .assign(Scope=tscol)
           for tscol in ["Target", "Source"]])
df_new["CustomerNr Scope"] = df_new.CustomerNr.astype(str)   " "   df_new.Scope


# result
   CustomerNr Value  Percentage   Scope CustomerNr Scope
0        1001     A         0.2  Target      1001 Target
1        1004     D         0.3  Target      1004 Target
2        1005     C         0.4  Target      1005 Target
3        1010     A         0.5  Target      1010 Target
0        1001     C         0.2  Source      1001 Source
1        1004   NaN         0.3  Source      1004 Source
2        1005     D         0.4  Source      1005 Source
3        1010     D         0.5  Source      1010 Source


Or (building on Zephyrus' answer) use melt with Percentage as additional id_var to get the the desired table right away (assuming that Percentage uniquely depends on CustomerNr):

pd.melt(df, id_vars=['CustomerNr', "Percentage"],
        value_vars=['Target', 'Source'],
        var_name='Scope')

CodePudding user response:

You can use pandas melt to unpivot your dataframe:

df_melted =  pd.melt(df, id_vars=['CustomerNr'], value_vars=['Target', 'Source'], var_name='Scope')

This doesn't include the 'Percentage' column, but you can merge that back into the new dataframe:

df_melted = df_melted.merge(df[[ 'CustomerNr', 'Percentage']], left_on='CustomerNr', right_on='CustomerNr' )

If you want your `'CustomerNr'` column and `'Scope`' column together you can easily add them together to one column. 
  • Related