I have the following dataframe
CustomerNr | Target | Source | Percentage |
---|---|---|---|
1001 | A | C | 0.2 |
1004 | D | np.nan | 0.3 |
1005 | C | D | 0.4 |
1010 | A | D | 0.5 |
import numpy as np
df = pd.DataFrame([[1001, 'A','C',0.2], [1004, 'D',np.nan,0.3],[1005, 'C','D',0.4],
[1010, 'A','D',0.5]], columns=['CustomerNr','Target','Source','Percentage'])
to this one (any ideas how to formulate the title for this problem by the way)
import numpy as np
df = pd.DataFrame([['1001 Target' , 'A',0.2],
['1001 Source' , 'C',0.2],
['1004 Target', 'D',0.3],
['1004 Source', np.nan,0.3],
['1005 Target', 'C',0.4],
['1005 Source', 'D',0.4],
['10010 Target', 'A',0.5],
['10010 Source', 'D',0.5],
], columns=['CustomerNr Scope','Value','Percentage'])
CustomerNr Scope | Value | Percentage |
---|---|---|
1001 Target | A | 0.2 |
1001 Source | C | 0.2 |
1004 Target | D | 0.3 |
1004 Source | NaN | 0.3 |
1005 Target | C | 0.4 |
1005 Source | D | 0.4 |
10010 Target | A | 0.5 |
10010 Source | D | 0.5 |
CodePudding user response:
You could use pandas stack
to achieve this:
(df.set_index(["CustomerNr", "Percentage"])
.rename_axis("Scope", axis=1)
.stack(dropna=False)
.rename("Value")
.reset_index()
.assign(CustomerNrScope=lambda df: df[["CustomerNr", "Scope"]].astype(str).apply(" ".join, axis=1)))
Or concatenate the tables for source and target:
df_new = pd.concat([df[["CustomerNr", tscol, "Percentage"]]
.rename(columns={tscol: "Value"})
.assign(Scope=tscol)
for tscol in ["Target", "Source"]])
df_new["CustomerNr Scope"] = df_new.CustomerNr.astype(str) " " df_new.Scope
# result
CustomerNr Value Percentage Scope CustomerNr Scope
0 1001 A 0.2 Target 1001 Target
1 1004 D 0.3 Target 1004 Target
2 1005 C 0.4 Target 1005 Target
3 1010 A 0.5 Target 1010 Target
0 1001 C 0.2 Source 1001 Source
1 1004 NaN 0.3 Source 1004 Source
2 1005 D 0.4 Source 1005 Source
3 1010 D 0.5 Source 1010 Source
Or (building on Zephyrus' answer) use melt with Percentage
as additional id_var
to get the the desired table right away (assuming that Percentage
uniquely depends on CustomerNr
):
pd.melt(df, id_vars=['CustomerNr', "Percentage"],
value_vars=['Target', 'Source'],
var_name='Scope')
CodePudding user response:
You can use pandas melt to unpivot your dataframe:
df_melted = pd.melt(df, id_vars=['CustomerNr'], value_vars=['Target', 'Source'], var_name='Scope')
This doesn't include the 'Percentage'
column, but you can merge that back into the new dataframe:
df_melted = df_melted.merge(df[[ 'CustomerNr', 'Percentage']], left_on='CustomerNr', right_on='CustomerNr' )
If you want your `'CustomerNr'` column and `'Scope`' column together you can easily add them together to one column.