I have 4 columns in a dataframe, they are:
id
external_id
value
description
Exemple table:
ID | external_id | value | description |
---|---|---|---|
1 | 65 | Pendent | Account Onboarding All iN |
1 | 65 | Lais | Gestor |
2 | 93 | Maria | Account Onboarding All iN |
3 | 454 | Renan | Gestor |
4 | 535 | Osiris | Modelo de negocio |
5 | 999 | Togo | Account Onboarding All iN |
6 | 25 | João | Gestor |
7 | 85 | Lima | Account CS SM |
8 | 22 | Teixeira | Account Onboarding SM |
I need to turn the descripton lines into columns and the value would be its respective value.
In general, I need to create a filter that creates the columns below, when it matches description
:
Account Onboarding All iN
Account Onboarding SM
Account CS All iN"
Account CS SM
Account Operacional All iN
Gestor
Natureza
Modelo de negocio
Vertical
Each description above needs to become a column and if it exists the record will come from the value
column.
Expected output:
id | external_id | Account Onboarding All iN | Account Onboarding SM | Account CS All iN | Account CS SM | Account Operacional All iN | Gestor | Natureza | Modelo de negocio | Vertical |
---|---|---|---|---|---|---|---|---|---|---|
1 | 65 | Pendent | None | None | None | None | Lais | None | None | None |
2 | 93 | Maria | None | None | None | None | None | None | None | None |
3 | 454 | None | None | None | None | None | Renan | None | None | None |
4 | 535 | None | None | None | None | None | None | None | Osiris | None |
5 | 999 | Togo | None | None | None | None | None | None | None | None |
6 | 25 | None | None | None | None | None | João | None | None | None |
7 | 85 | None | None | None | Lima | None | None | None | None | None |
8 | 22 | None | Teixeira | None | None | None | None | None | None | None |
One note is that if the ID
is the same, the results should be on the same line.
How can I do this?
CodePudding user response:
With empty strings
dummies = pd.get_dummies(df.description)
df.drop(['value', 'description'], axis=1).join(dummies.mul(df.value, axis=0))
ID external_id Account CS SM Account Onboarding All iN Account Onboarding SM Gestor Modelo de negocio
0 1 65 Pendent
1 1 65 Lais
2 2 93 Maria
3 3 454 Renan
4 4 535 Osiris
5 5 999 Togo
6 6 25 João
7 7 85 Lima
8 8 22 Teixeira
With nan
dummies = pd.get_dummies(df.description).eq(1)
tojoin = dummies.mask(dummies, df.value, axis=0).where(dummies)
df.drop(['value', 'description'], axis=1).join(tojoin)
ID external_id Account CS SM Account Onboarding All iN Account Onboarding SM Gestor Modelo de negocio
0 1 65 NaN Pendent NaN NaN NaN
1 1 65 NaN NaN NaN Lais NaN
2 2 93 NaN Maria NaN NaN NaN
3 3 454 NaN NaN NaN Renan NaN
4 4 535 NaN NaN NaN NaN Osiris
5 5 999 NaN Togo NaN NaN NaN
6 6 25 NaN NaN NaN João NaN
7 7 85 Lima NaN NaN NaN NaN
8 8 22 NaN NaN Teixeira NaN NaN
A little more surgical
j, c = pd.factorize(df.description)
n, m = df.shape[0], c.shape[0]
i = np.arange(n)
a = np.full((n, m), None)
a[i, j] = df.value
tojoin = pd.DataFrame(a, df.index, c)
df.drop(['value', 'description'], axis=1).join(tojoin)
ID external_id Account Onboarding All iN Gestor Modelo de negocio Account CS SM Account Onboarding SM
0 1 65 Pendent None None None None
1 1 65 None Lais None None None
2 2 93 Maria None None None None
3 3 454 None Renan None None None
4 4 535 None None Osiris None None
5 5 999 Togo None None None None
6 6 25 None João None None None
7 7 85 None None None Lima None
8 8 22 None None None None Teixeira
CodePudding user response:
you can use pd.pivot()
for it:
res = (df.pivot(index='external_id',columns='description',values='value')
.reset_index()
.rename_axis(columns=None, index='id')
)
print(res)
external_id Account CS SM Account Onboarding All iN Account Onboarding SM \
id
0 22 NaN NaN Teixeira
1 25 NaN NaN NaN
2 65 NaN Pendent NaN
3 85 Lima NaN NaN
4 93 NaN Maria NaN
5 454 NaN NaN NaN
6 535 NaN NaN NaN
7 999 NaN Togo NaN
Gestor Modelo de negocio
id
0 NaN NaN
1 João NaN
2 Lais NaN
3 NaN NaN
4 NaN NaN
5 Renan NaN
6 NaN Osiris
7 NaN NaN