Home > Software engineering >  How do I split a column to many colum by row quantity
How do I split a column to many colum by row quantity

Time:01-08

I got a long single column DataFrame as following table:

Column A
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8

I want to split column A in order with specify row quantity and add to others new columns If I give 2 row quantity for each column

Column A Column B Column C Column D
Cell 1 Cell 3 Cell 5 Cell 7
Cell 2 Cell 4 Cell 6 Cell 8

split long a single column to new adding column by given row quantity.

CodePudding user response:

You can use the underlying numpy array to reshape in Fortran order (rows, then columns):

from string import ascii_uppercase

N = 2

out = (pd.DataFrame(df['Column A'].to_numpy().reshape(N, -1, order='F'))
        # the line below is optional, just to have the column names
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C Column D
0   Cell 1   Cell 3   Cell 5   Cell 7
1   Cell 2   Cell 4   Cell 6   Cell 8

If you want to handle N that are non multiples of len(df), you can add a reindex step to pad the DataFrame with NaNs:

N = 3

out = (pd.DataFrame(df['Column A'].reindex(range(int(np.ceil(len(df)/N)*N)))
                                  .to_numpy().reshape(N, -1, order='F'))
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C
0   Cell 1   Cell 4   Cell 7
1   Cell 2   Cell 5   Cell 8
2   Cell 3   Cell 6      NaN

CodePudding user response:

You could divide the rows into groups and pivot the groups into columns.

Short Version

number_rows = 2
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df.index = pd.Series(range(len(df))) % number_rows
df = df.pivot(columns='cols', values='Column A')
cols     1.0     2.0     3.0     4.0
0     Cell 0  Cell 2  Cell 4  Cell 6
1     Cell 1  Cell 3  Cell 5  Cell 7

The code will also work if your number of rows to split by (number_rows) are not a multiple of length of your DataFrame, and missing values (np.nan) will be added.

Long Version with Explanation of Steps

Create demo data

import pandas as pd
import numpy as np
df = pd.DataFrame({'Column A': [f'Cell {i}' for i in range(4)]})
  Column A
0   Cell 0
1   Cell 1
2   Cell 2
3   Cell 3

Create columns for later columns and rows

number_rows = 3 # 3 instead of 2 to illustrate filling with missing values
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df['cols'] = 'Column ' df['cols'].astype(int).astype(str)
df['rows'] = pd.Series(range(len(df))) % number_rows
  Column A      cols  rows
0   Cell 0  Column 1     0
1   Cell 1  Column 1     1
2   Cell 2  Column 1     2
3   Cell 3  Column 2     0

Pivot table

df = df.pivot(columns='cols', index='rows', values='Column A')
cols Column 1 Column 2
rows                  
0      Cell 0   Cell 3
1      Cell 1      NaN
2      Cell 2      NaN

You can remove the column and index names with:

df.columns.name = df.index.name = None

CodePudding user response:

You should Create a new row, with the name of new columns corresponding to each value and also change the correpoding index. Finally you can use df.pivote() to pivot your dataframe to the new format.

n_rows = 2
df['new_col'] = "Column "
df['new_col']=df['new_col'] pd.Series(df.index%n_rows).ne(1).cumsum().astype(str)
df.index=df.index%n_rows
print(df.pivot(columns='new_col', values='Column A'))

new_col Column 1 Column 2 Column 3 Column 4
0         Cell 1   Cell 3   Cell 5   Cell 7
1         Cell 2   Cell 4   Cell 6   Cell 8
  • Related