Home > OS >  Transpose horizontally stored data to multiple rows
Transpose horizontally stored data to multiple rows

Time:10-24

I have a sheet I need to transform from data being stored horizontally to vertically, i.e. from:

Company ID DoB Name DoB Name DoB Name
ID 1 DoB 1 Name 1 DoB 2 Name 2 DoB 3 Name 3
ID 2 DoB 4 Name 4 DoB 5 Name 5

To:

Company ID DoB Name
ID 1 DoB 1 Name 1
ID 1 DoB 2 Name 2
ID 1 DoB 3 Name 3
ID 2 DoB 4 Name 4
ID 2 DoB 5 Name 5

The data is structured such that the DoB/name entities always occur periodically on rows as illustrated above. The number of entities stored horizontally on each row can vary from none to 16.

How would one go about solving this in Python/Pandas (or something else)? This is a one-time thing, so performance is not really an issue.

Thankful for any help!

CodePudding user response:

You can use:

out = (df
   .set_index('Company ID')
   .pipe(lambda d: d.set_axis(
          pd.MultiIndex
            .from_arrays([d.columns,
                          d.groupby(level=0, axis=1).cumcount().add(1)]),
                              axis=1)
        )
   .stack()
   .reset_index().drop(columns='level_1')
   
)

Alternative:

out = pd.concat([g.stack().droplevel(1).rename(k)
                 for k,g in df.set_index('Company ID').groupby(level=0, axis=1)],
                 axis=1).reset_index()

output:

  Company ID    DoB    Name
0       ID 1  DoB 1  Name 1
1       ID 1  DoB 2  Name 2
2       ID 1  DoB 3  Name 3
3       ID 2  DoB 4  Name 4
4       ID 2  DoB 5  Name 5

CodePudding user response:

I would use stack in a for loop here:

res = pd.DataFrame()
for col in df.columns.drop('Company ID').unique():
    new_df = ( df.set_index('Company ID')[col].stack().reset_index().
                  rename({0:col}, axis=1)[['Company ID', col]] )
    res = pd.concat([res, new_df], axis=1)
# drop duplicated columns coming from stack    
res = res.loc[:, ~res.columns.duplicated()]

Output:

Company ID  DoB Name
0   ID 1    DoB 1   Name 1
1   ID 1    DoB 2   Name 2
2   ID 1    DoB 3   Name 3
3   ID 2    DoB 4   Name 4
4   ID 2    DoB 5   Name 5

CodePudding user response:

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
    index = 'Company ID', 
    names_to = '.value', 
    names_pattern= '(. )', 
    sort_by_appearance=True)
.dropna()
) 
  Company ID    DoB    Name
0       ID 1  DoB 1  Name 1
1       ID 1  DoB 2  Name 2
2       ID 1  DoB 3  Name 3
3       ID 2  DoB 4  Name 4
4       ID 2  DoB 5  Name 5
  • Related