Home > Back-end >  Reshape a pandas DataFrame by expanding it horizontally
Reshape a pandas DataFrame by expanding it horizontally

Time:03-11

I have a DataFrame with 4000 rows and 5 columns.

They are information from multiple excel workbooks that I read in to one single sheet. Now I want to rearrange them in a horizontal manner, basically every time the header of the original excel sheet appears in the data, I want to move it horizontally.

Can you advise please.

a b 
. .
. .
. .
. . 
a b 
. .
. .
a b 
. .
. .
. .
. .
a b 
. .
. .  
. .
. .
. .
. .
a b 
. .
. .

and I want to have something like

a b a b a b a b a b
. . . . . . . . . .
. . . . . . . . . .
. .     . . . .
. .     . . . .
            . .
            . .

Amendment:

    symbol  weight      lqdty       date
0   1712    0.007871    7.023737    20210104
1   1726    0.007650    3.221021    20210104
2   1824    0.032955    3.475508    20210104
0   1871    0.006443    4.615002    20210105
1   1887    0.007840    6.678486    20210105
2   1871    0.006443    4.615002    20210105
3   1887    0.007840    6.678486    20210105
0   1871    0.006443    4.615002    20210106
1   1887    0.007840    6.678486    20210106

CodePudding user response:

I assumed "a", "b" are column names here.

Create groups by where the column appear in the data and set_index with it. Then filter column names out and stack the DataFrame. Then we'll have a MultiIndex Series where outer index level is the groups and the inner level is the column names. Then groupby agg(list) DataFrame transpose will fetch us the desired DataFrame.

cols = df.columns.tolist()
s = df.stack().groupby(level=[0,1]).agg(list)
out = pd.DataFrame(s.tolist(), index=s.index.get_level_values(1)).fillna('').T

Output:

   symbol    weight     lqdty        date  symbol   weight     lqdty        date  symbol    weight     lqdty        date  symbol   weight     lqdty        date  
0  1712.0  0.007871  7.023737  20210104.0  1726.0  0.00765  3.221021  20210104.0  1824.0  0.032955  3.475508  20210104.0  1887.0  0.00784  6.678486  20210105.0  
1  1871.0  0.006443  4.615002  20210105.0  1887.0  0.00784  6.678486  20210105.0  1871.0  0.006443  4.615002  20210105.0                    
2  1871.0  0.006443  4.615002  20210106.0  1887.0  0.00784  6.678486  20210106.0                                                            

CodePudding user response:

An alternative, provided that your dataframe looks like

data = {
    'symbol': [1712, 1726, 1824, 1871, 1887, 1871, 1887, 1871, 1887],
    'weight': [0.007871, 0.00765, 0.032955, 0.006443, 0.00784, 0.006443, 0.00784, 0.006443, 0.00784],
    'lqdty': [7.023737, 3.221021, 3.475508, 4.615002, 6.678486, 4.615002, 6.678486, 4.615002, 6.678486],
    'date': [20210104, 20210104, 20210104, 20210105, 20210105, 20210105, 20210105, 20210106, 20210106]
}
index = [0, 1, 2, 0, 1, 2, 3, 0, 1]
df = pd.DataFrame(data, index=index)

would be

groups = pd.Series(df.index).eq(0).cumsum().values
result = pd.concat((sdf for _, sdf in df.groupby(groups)), axis=1)

Result:

   symbol    weight     lqdty        date  symbol    weight     lqdty  \
0  1712.0  0.007871  7.023737  20210104.0    1871  0.006443  4.615002   
1  1726.0  0.007650  3.221021  20210104.0    1887  0.007840  6.678486   
2  1824.0  0.032955  3.475508  20210104.0    1871  0.006443  4.615002   
3     NaN       NaN       NaN         NaN    1887  0.007840  6.678486   

       date  symbol    weight     lqdty        date  
0  20210105  1871.0  0.006443  4.615002  20210106.0  
1  20210105  1887.0  0.007840  6.678486  20210106.0  
2  20210105     NaN       NaN       NaN         NaN  
3  20210105     NaN       NaN       NaN         NaN  
  • Related