I am trying to convert a wide dataframe [2r, 12c] to a long dataframe [4r,6c).
From this ...
0 1 2 3 4 5 6 7 8 9 10 11
0 1.0 0.9 0.8 0.5 0.4 0.3 0.7 0.9 0.2 0.1 0.4 0.3
1 0.6 0.8 0.5 0.3 0.7 0.1 0.4 0.2 0.8 0.6 0.4 0.1
... to this ...
0 1 2 3 4 5
0 1.0 0.9 0.8 0.5 0.4 0.3
1 0.7 0.9 0.2 0.1 0.4 0.3
2 0.6 0.8 0.5 0.3 0.7 0.1
3 0.4 0.2 0.8 0.6 0.4 0.1
My code is producing unexpected results:
0 1 2 3 4 5 6 7 8 9 10 11
0 1.0 0.9 0.8 0.5 0.4 0.3
1 Nan Nan Nan Nan Nan Nan 0.7 0.9 0.2 0.1 0.4 0.3
etc...
My code:
import pandas as pd
import numpy as np
df = pd.DataFrame(
np.random.randn(2, 12)
)
slice_sz = 6
x = df.columns
col_num = len(x)
col_iter = int(col_num / slice_sz)
y = df.index
row_num = len(y)
df_list = pd.DataFrame([])
for row in range(row_num):
# print('Processing row', row)
i = 0
j = slice_sz - 1
for col in range(col_iter):
df_sliced = df.iloc[row, i:j]
df_sliced = df_sliced.to_frame()
df_sliced = df_sliced.transpose()
df_list = df_list.append(df_sliced)
df_sliced = []
i = slice_sz
j = slice_sz
I know there must be a more elegant way to do this. Thanks for your help,
CodePudding user response:
Use np.hstack
and np.split
:
pd.DataFrame(np.hstack(np.split(df.T.to_numpy(), np.arange(slice_sz, df.shape[1], slice_sz))).T)
Output:
0 1 2 3 4 5
0 1.0 0.9 0.8 0.5 0.4 0.3
1 0.6 0.8 0.5 0.3 0.7 0.1
2 0.7 0.9 0.2 0.1 0.4 0.3
3 0.4 0.2 0.8 0.6 0.4 0.1
If you want the order:
newdf = pd.DataFrame(np.hstack(np.split(df.T.to_numpy(), np.arange(slice_sz, df.shape[1], slice_sz))).T)
newdf.iloc[[*newdf.index[::2], *newdf.index[1::2]]].reset_index(drop=True)
Output:
0 1 2 3 4 5
0 1.0 0.9 0.8 0.5 0.4 0.3
1 0.7 0.9 0.2 0.1 0.4 0.3
2 0.6 0.8 0.5 0.3 0.7 0.1
3 0.4 0.2 0.8 0.6 0.4 0.1
CodePudding user response:
For the code above, I would use df_list.pivot()
Within the .pivot()
you will need to specify index = ''
, columns = ''
and values = ''
. There is plenty of info around on .pivot()
should you require any further clarification.
This function will help with either a wide-to-long or a long-to-wide transpose of your dataframe.
This should do away with the for loop you put together to undertake this task.
CodePudding user response:
If you do not care for pandas, I would do something like this:
import numpy as np
data = np.random.randn(2, 12)
# reshape is your friend!
data_transformed = data.reshape(4,6)
Please checkout numpy.reshape
documentation here: https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
Although if you really, really, really want to reshape a dataframe (although I'm not sure why you would want to), you can do this:
import pandas as pd
import numpy as np
df = pd.DataFrame(
np.random.randn(2, 12)
)
df = pd.DataFrame(df.to_numpy().reshape(4,6))
Which really all I did there was convert the dataframe into a numpy array, reshape, and set the numpy array back to a dataframe.