Reshaping a pandas dataframe in a specific manner-CodePudding

Consider the code below:

import pandas as pd

d = {'col1': [1, 2, 3 ,4 ,5, 5, 6, 5], 'col2': [3, 4, 3 ,4 , 5, 6 , 6, 5], 'col3': [5, 6, 3 ,4 , 5, 6 ,6, 5], 'col4': [7, 8, 3 , 4 , 5, 4 , 6, 4], }

df = pd.DataFrame(data=d)

df=df.T

This code gives me the following output:

#       0  1  2  3  4  5  6  7
# col1  1  2  3  4  5  5  6  5
# col2  3  4  3  4  5  6  6  5
# col3  5  6  3  4  5  6  6  5
# col4  7  8  3  4  5  4  6  4

I would like to reshape the dataframe in such a way that the columns are rearranged as shown below:

#       0  1  
# col1  1  2  
# col2  3  4  
# col3  5  6  
# col4  7  8  
# col1  3  4  
# col2  3  4  
# col3  3  4  
# col4  3  4  
# col1  5  5 
# col2  5  6  
# col3  5  6  
# col4  5  4  
# col1  6  5
# col2  6  5
# col3  6  5
# col4  6  4

The code should allow some room for modification so that one can choose two columns as in the above example or three columns or four columns and so on. Any ideas how to implement this?

CodePudding user response：

Try this:

import pandas as pd

d = {'col1': [1, 2, 3 ,4 ,5, 5, 6, 5], 'col2': [3, 4, 3 ,4 , 5, 6 , 6, 5], 'col3': [5, 6, 3 ,4 , 5, 6 ,6, 5], 'col4': [7, 8, 3 , 4 , 5, 4 , 6, 4], }

df = pd.DataFrame(data=d)
df = df.T
number = 2    #Here you can choose the number of columns
df1 = df.iloc[:, :number]
for x in range(0, len(df.columns), number):
    df1 = pd.concat([df1, df.iloc[:, x:x   number].T.reset_index(drop=True).T])
print(df1)

CodePudding user response：

A much faster way, is to use numpy, especially as the number of columns is even.

You are reshaping into a 2 column dataframe; this is achieved with np.reshape:

data = np.reshape(df.to_numpy(), (-1, 2))

data
array([[1, 2],
       [3, 4],
       [5, 5],
       [6, 5],
       [3, 4],
       [3, 4],
       [5, 6],
       [6, 5],
       [5, 6],
       [3, 4],
       [5, 6],
       [6, 5],
       [7, 8],
       [3, 4],
       [5, 4],
       [6, 4]])

The length of the current index is 4; when reshaped, it should be length of current index * length of columns/2:

 index = np.tile(df.index, df.columns.size//2)
index
array(['col1', 'col2', 'col3', 'col4', 'col1', 'col2', 'col3', 'col4',
       'col1', 'col2', 'col3', 'col4', 'col1', 'col2', 'col3', 'col4'],
      dtype=object)

All that is left is to create a new dataframe:

pd.DataFrame(data, index = index)

      0  1
col1  1  2
col2  3  4
col3  5  5
col4  6  5
col1  3  4
col2  3  4
col3  5  6
col4  6  5
col1  5  6
col2  3  4
col3  5  6
col4  6  5
col1  7  8
col2  3  4
col3  5  4
col4  6  4

Another option, is to use the idea of even and odd rows to reshape the data, with pyjanitor's pivot_longer function; collate even(0) and odd(1) into separate columns:

# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor

(df.set_axis((df.columns % 2).astype(str), axis=1)
   .pivot_longer(ignore_index=False, 
                 names_to = ['0', '1'], 
                 names_pattern=['0', '1'])
)
      0  1
col1  1  2
col2  3  4
col3  5  6
col4  7  8
col1  3  4
col2  3  4
col3  3  4
col4  3  4
col1  5  5
col2  5  6
col3  5  6
col4  5  4
col1  6  5
col2  6  5
col3  6  5
col4  6  4

Again, the numpy approach is much faster