Pivot column in Pandas-CodePudding

I have 4 columns in a dataframe, they are: id external_id value description

Exemple table:

ID	external_id	value	description
1	65	Pendent	Account Onboarding All iN
1	65	Lais	Gestor
2	93	Maria	Account Onboarding All iN
3	454	Renan	Gestor
4	535	Osiris	Modelo de negocio
5	999	Togo	Account Onboarding All iN
6	25	João	Gestor
7	85	Lima	Account CS SM
8	22	Teixeira	Account Onboarding SM

I need to turn the descripton lines into columns and the value would be its respective value.

In general, I need to create a filter that creates the columns below, when it matches description:

Account Onboarding All iN Account Onboarding SM Account CS All iN" Account CS SM Account Operacional All iN Gestor Natureza Modelo de negocio Vertical

Each description above needs to become a column and if it exists the record will come from the value column.

Expected output:

id	external_id	Account Onboarding All iN	Account Onboarding SM	Account CS All iN	Account CS SM	Account Operacional All iN	Gestor	Natureza	Modelo de negocio	Vertical
1	65	Pendent	None	None	None	None	Lais	None	None	None
2	93	Maria	None	None	None	None	None	None	None	None
3	454	None	None	None	None	None	Renan	None	None	None
4	535	None	None	None	None	None	None	None	Osiris	None
5	999	Togo	None	None	None	None	None	None	None	None
6	25	None	None	None	None	None	João	None	None	None
7	85	None	None	None	Lima	None	None	None	None	None
8	22	None	Teixeira	None	None	None	None	None	None	None

One note is that if the ID is the same, the results should be on the same line.

How can I do this?

CodePudding user response：

With empty strings

dummies = pd.get_dummies(df.description)
df.drop(['value', 'description'], axis=1).join(dummies.mul(df.value, axis=0))

   ID  external_id Account CS SM Account Onboarding All iN Account Onboarding SM Gestor Modelo de negocio
0   1           65                                 Pendent                                               
1   1           65                                                                 Lais                  
2   2           93                                   Maria                                               
3   3          454                                                                Renan                  
4   4          535                                                                                 Osiris
5   5          999                                    Togo                                               
6   6           25                                                                 João                  
7   7           85          Lima                                                                         
8   8           22                                                      Teixeira

With `nan`

dummies = pd.get_dummies(df.description).eq(1)
tojoin = dummies.mask(dummies, df.value, axis=0).where(dummies)
df.drop(['value', 'description'], axis=1).join(tojoin)

   ID  external_id Account CS SM Account Onboarding All iN Account Onboarding SM Gestor Modelo de negocio
0   1           65           NaN                   Pendent                   NaN    NaN               NaN
1   1           65           NaN                       NaN                   NaN   Lais               NaN
2   2           93           NaN                     Maria                   NaN    NaN               NaN
3   3          454           NaN                       NaN                   NaN  Renan               NaN
4   4          535           NaN                       NaN                   NaN    NaN            Osiris
5   5          999           NaN                      Togo                   NaN    NaN               NaN
6   6           25           NaN                       NaN                   NaN   João               NaN
7   7           85          Lima                       NaN                   NaN    NaN               NaN
8   8           22           NaN                       NaN              Teixeira    NaN               NaN

A little more surgical

j, c = pd.factorize(df.description)
n, m = df.shape[0], c.shape[0]
i = np.arange(n)
a = np.full((n, m), None)
a[i, j] = df.value
tojoin = pd.DataFrame(a, df.index, c)
df.drop(['value', 'description'], axis=1).join(tojoin)

   ID  external_id Account Onboarding All iN Gestor Modelo de negocio Account CS SM Account Onboarding SM
0   1           65                   Pendent   None              None          None                  None
1   1           65                      None   Lais              None          None                  None
2   2           93                     Maria   None              None          None                  None
3   3          454                      None  Renan              None          None                  None
4   4          535                      None   None            Osiris          None                  None
5   5          999                      Togo   None              None          None                  None
6   6           25                      None   João              None          None                  None
7   7           85                      None   None              None          Lima                  None
8   8           22                      None   None              None          None              Teixeira

CodePudding user response：

you can use pd.pivot() for it:

res = (df.pivot(index='external_id',columns='description',values='value')
       .reset_index()
       .rename_axis(columns=None, index='id')
      )

print(res)

    external_id Account CS SM Account Onboarding All iN Account Onboarding SM  \
id                                                                              
0            22           NaN                       NaN              Teixeira   
1            25           NaN                       NaN                   NaN   
2            65           NaN                   Pendent                   NaN   
3            85          Lima                       NaN                   NaN   
4            93           NaN                     Maria                   NaN   
5           454           NaN                       NaN                   NaN   
6           535           NaN                       NaN                   NaN   
7           999           NaN                      Togo                   NaN   

   Gestor Modelo de negocio  
id                           
0     NaN               NaN  
1    João               NaN  
2    Lais               NaN  
3     NaN               NaN  
4     NaN               NaN  
5   Renan               NaN  
6     NaN            Osiris  
7     NaN               NaN

With empty strings

With nan

A little more surgical

With `nan`