Home > Software engineering >  How to convert wide dataframe to long dataframe
How to convert wide dataframe to long dataframe

Time:11-09

I am trying to convert a wide dataframe [2r, 12c] to a long dataframe [4r,6c).

From this ...
   0    1    2    3    4    5    6    7    8    9    10   11  
0  1.0  0.9  0.8  0.5  0.4  0.3  0.7  0.9  0.2  0.1  0.4  0.3
1  0.6  0.8  0.5  0.3  0.7  0.1  0.4  0.2  0.8  0.6  0.4  0.1

... to this ...
   0    1    2    3    4    5       
0  1.0  0.9  0.8  0.5  0.4  0.3   
1  0.7  0.9  0.2  0.1  0.4  0.3  
2  0.6  0.8  0.5  0.3  0.7  0.1  
3  0.4  0.2  0.8  0.6  0.4  0.1

My code is producing unexpected results:

   0    1    2    3    4    5   6    7    8    9    10   11    
0  1.0  0.9  0.8  0.5  0.4  0.3 
1  Nan  Nan  Nan  Nan  Nan  Nan 0.7  0.9  0.2  0.1  0.4  0.3 
etc...

My code:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.randn(2, 12)
)

slice_sz = 6

x = df.columns
col_num = len(x)

col_iter = int(col_num / slice_sz)

y = df.index
row_num = len(y)

df_list = pd.DataFrame([])

for row in range(row_num):
    # print('Processing row', row)
    i = 0
    j = slice_sz - 1

    for col in range(col_iter):
        df_sliced = df.iloc[row, i:j]
        df_sliced = df_sliced.to_frame()
        df_sliced = df_sliced.transpose()
        df_list = df_list.append(df_sliced)
        df_sliced = []
        i  = slice_sz
        j  = slice_sz

I know there must be a more elegant way to do this. Thanks for your help,

CodePudding user response:

Use np.hstack and np.split:

pd.DataFrame(np.hstack(np.split(df.T.to_numpy(), np.arange(slice_sz, df.shape[1], slice_sz))).T)

Output:

     0    1    2    3    4    5
0  1.0  0.9  0.8  0.5  0.4  0.3
1  0.6  0.8  0.5  0.3  0.7  0.1
2  0.7  0.9  0.2  0.1  0.4  0.3
3  0.4  0.2  0.8  0.6  0.4  0.1

If you want the order:

newdf = pd.DataFrame(np.hstack(np.split(df.T.to_numpy(), np.arange(slice_sz, df.shape[1], slice_sz))).T)
newdf.iloc[[*newdf.index[::2], *newdf.index[1::2]]].reset_index(drop=True)

Output:

     0    1    2    3    4    5
0  1.0  0.9  0.8  0.5  0.4  0.3
1  0.7  0.9  0.2  0.1  0.4  0.3
2  0.6  0.8  0.5  0.3  0.7  0.1
3  0.4  0.2  0.8  0.6  0.4  0.1

CodePudding user response:

For the code above, I would use df_list.pivot()

Within the .pivot() you will need to specify index = '', columns = '' and values = ''. There is plenty of info around on .pivot() should you require any further clarification.

This function will help with either a wide-to-long or a long-to-wide transpose of your dataframe.

This should do away with the for loop you put together to undertake this task.

CodePudding user response:

If you do not care for pandas, I would do something like this:

import numpy as np

data = np.random.randn(2, 12)

# reshape is your friend! 
data_transformed = data.reshape(4,6)

Please checkout numpy.reshape documentation here: https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

Although if you really, really, really want to reshape a dataframe (although I'm not sure why you would want to), you can do this:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.randn(2, 12)
)

df = pd.DataFrame(df.to_numpy().reshape(4,6))

Which really all I did there was convert the dataframe into a numpy array, reshape, and set the numpy array back to a dataframe.

  • Related