I have a DataFrame with 4000 rows and 5 columns.
They are information from multiple excel workbooks that I read in to one single sheet. Now I want to rearrange them in a horizontal manner, basically every time the header of the original excel sheet appears in the data, I want to move it horizontally.
Can you advise please.
a b
. .
. .
. .
. .
a b
. .
. .
a b
. .
. .
. .
. .
a b
. .
. .
. .
. .
. .
. .
a b
. .
. .
and I want to have something like
a b a b a b a b a b
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
. . . . . .
. .
. .
Amendment:
symbol weight lqdty date
0 1712 0.007871 7.023737 20210104
1 1726 0.007650 3.221021 20210104
2 1824 0.032955 3.475508 20210104
0 1871 0.006443 4.615002 20210105
1 1887 0.007840 6.678486 20210105
2 1871 0.006443 4.615002 20210105
3 1887 0.007840 6.678486 20210105
0 1871 0.006443 4.615002 20210106
1 1887 0.007840 6.678486 20210106
CodePudding user response:
I assumed "a", "b" are column names here.
Create groups by where the column appear in the data and set_index
with it. Then filter column names out and stack
the DataFrame. Then we'll have a MultiIndex Series where outer index level is the groups and the inner level is the column names. Then groupby
agg(list)
DataFrame
transpose
will fetch us the desired DataFrame.
cols = df.columns.tolist()
s = df.stack().groupby(level=[0,1]).agg(list)
out = pd.DataFrame(s.tolist(), index=s.index.get_level_values(1)).fillna('').T
Output:
symbol weight lqdty date symbol weight lqdty date symbol weight lqdty date symbol weight lqdty date
0 1712.0 0.007871 7.023737 20210104.0 1726.0 0.00765 3.221021 20210104.0 1824.0 0.032955 3.475508 20210104.0 1887.0 0.00784 6.678486 20210105.0
1 1871.0 0.006443 4.615002 20210105.0 1887.0 0.00784 6.678486 20210105.0 1871.0 0.006443 4.615002 20210105.0
2 1871.0 0.006443 4.615002 20210106.0 1887.0 0.00784 6.678486 20210106.0
CodePudding user response:
An alternative, provided that your dataframe looks like
data = {
'symbol': [1712, 1726, 1824, 1871, 1887, 1871, 1887, 1871, 1887],
'weight': [0.007871, 0.00765, 0.032955, 0.006443, 0.00784, 0.006443, 0.00784, 0.006443, 0.00784],
'lqdty': [7.023737, 3.221021, 3.475508, 4.615002, 6.678486, 4.615002, 6.678486, 4.615002, 6.678486],
'date': [20210104, 20210104, 20210104, 20210105, 20210105, 20210105, 20210105, 20210106, 20210106]
}
index = [0, 1, 2, 0, 1, 2, 3, 0, 1]
df = pd.DataFrame(data, index=index)
would be
groups = pd.Series(df.index).eq(0).cumsum().values
result = pd.concat((sdf for _, sdf in df.groupby(groups)), axis=1)
Result:
symbol weight lqdty date symbol weight lqdty \
0 1712.0 0.007871 7.023737 20210104.0 1871 0.006443 4.615002
1 1726.0 0.007650 3.221021 20210104.0 1887 0.007840 6.678486
2 1824.0 0.032955 3.475508 20210104.0 1871 0.006443 4.615002
3 NaN NaN NaN NaN 1887 0.007840 6.678486
date symbol weight lqdty date
0 20210105 1871.0 0.006443 4.615002 20210106.0
1 20210105 1887.0 0.007840 6.678486 20210106.0
2 20210105 NaN NaN NaN NaN
3 20210105 NaN NaN NaN NaN