I got a long single column DataFrame as following table:
Column A |
---|
Cell 1 |
Cell 2 |
Cell 3 |
Cell 4 |
Cell 5 |
Cell 6 |
Cell 7 |
Cell 8 |
I want to split column A in order with specify row quantity and add to others new columns If I give 2 row quantity for each column
Column A | Column B | Column C | Column D |
---|---|---|---|
Cell 1 | Cell 3 | Cell 5 | Cell 7 |
Cell 2 | Cell 4 | Cell 6 | Cell 8 |
split long a single column to new adding column by given row quantity.
CodePudding user response:
You can use the underlying numpy array to reshape
in Fortran order (rows, then columns):
from string import ascii_uppercase
N = 2
out = (pd.DataFrame(df['Column A'].to_numpy().reshape(N, -1, order='F'))
# the line below is optional, just to have the column names
.rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
)
Output:
Column A Column B Column C Column D
0 Cell 1 Cell 3 Cell 5 Cell 7
1 Cell 2 Cell 4 Cell 6 Cell 8
If you want to handle N
that are non multiples of len(df)
, you can add a reindex
step to pad the DataFrame with NaNs:
N = 3
out = (pd.DataFrame(df['Column A'].reindex(range(int(np.ceil(len(df)/N)*N)))
.to_numpy().reshape(N, -1, order='F'))
.rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
)
Output:
Column A Column B Column C
0 Cell 1 Cell 4 Cell 7
1 Cell 2 Cell 5 Cell 8
2 Cell 3 Cell 6 NaN
CodePudding user response:
You could divide the rows into groups and pivot the groups into columns.
Short Version
number_rows = 2
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df.index = pd.Series(range(len(df))) % number_rows
df = df.pivot(columns='cols', values='Column A')
cols 1.0 2.0 3.0 4.0
0 Cell 0 Cell 2 Cell 4 Cell 6
1 Cell 1 Cell 3 Cell 5 Cell 7
The code will also work if your number of rows to split by (number_rows
) are not a multiple of length of your DataFrame, and missing values (np.nan
) will be added.
Long Version with Explanation of Steps
Create demo data
import pandas as pd
import numpy as np
df = pd.DataFrame({'Column A': [f'Cell {i}' for i in range(4)]})
Column A
0 Cell 0
1 Cell 1
2 Cell 2
3 Cell 3
Create columns for later columns and rows
number_rows = 3 # 3 instead of 2 to illustrate filling with missing values
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df['cols'] = 'Column ' df['cols'].astype(int).astype(str)
df['rows'] = pd.Series(range(len(df))) % number_rows
Column A cols rows
0 Cell 0 Column 1 0
1 Cell 1 Column 1 1
2 Cell 2 Column 1 2
3 Cell 3 Column 2 0
Pivot table
df = df.pivot(columns='cols', index='rows', values='Column A')
cols Column 1 Column 2
rows
0 Cell 0 Cell 3
1 Cell 1 NaN
2 Cell 2 NaN
You can remove the column and index names with:
df.columns.name = df.index.name = None
CodePudding user response:
You should Create a new row, with the name of new columns corresponding to each value and also change the correpoding index. Finally you can use df.pivote()
to pivot your dataframe to the new format.
n_rows = 2
df['new_col'] = "Column "
df['new_col']=df['new_col'] pd.Series(df.index%n_rows).ne(1).cumsum().astype(str)
df.index=df.index%n_rows
print(df.pivot(columns='new_col', values='Column A'))
new_col Column 1 Column 2 Column 3 Column 4
0 Cell 1 Cell 3 Cell 5 Cell 7
1 Cell 2 Cell 4 Cell 6 Cell 8