Dataframe slice after every nth column in python-CodePudding

My dataframe has 6 rows and 1488 columns (6, 1488) and I need to slice the dataframe such that all slices/ chunks are of a size (6, 22) each.

So I want a since after every 22nd column. Finally, I want to append all these slices one below another - So I get a final dataframe of size - (~405, 22)

Any help will be appreciated.

CodePudding user response：

I'm not sure exactly what your dataframe looks like, but something like this should work.

# create an example dataframe
df = pd.DataFrame(np.random.random((6, 1488)))
df
       0         1         2         3         4         5         6         7         8     ...      1479      1480      1481      1482      1483      1484      1485      1486      1487
0  0.202945  0.764556  0.935441  0.811226  0.813502  0.218969  0.612307  0.501421  0.654886  ...  0.849323  0.179219  0.383729  0.453096  0.515090  0.042625  0.157411  0.738439  0.866627
1  0.284549  0.631829  0.562288  0.122613  0.678792  0.494868  0.896530  0.928943  0.740604  ...  0.212852  0.947779  0.993973  0.394951  0.678237  0.590767  0.690921  0.792253  0.748520
2  0.233059  0.349914  0.966794  0.005431  0.051786  0.002843  0.677197  0.557434  0.858027  ...  0.127492  0.324699  0.793800  0.327186  0.619923  0.871256  0.494916  0.487993  0.368654
3  0.862628  0.114289  0.663868  0.929045  0.796207  0.386012  0.097557  0.700127  0.719978  ...  0.535595  0.400371  0.375005  0.509740  0.412794  0.399939  0.414794  0.769017  0.591004
4  0.719133  0.130646  0.438649  0.921081  0.384160  0.393997  0.338588  0.120220  0.115953  ...  0.060460  0.297115  0.823037  0.299341  0.923836  0.111853  0.256940  0.344354  0.745989
5  0.686776  0.711688  0.232884  0.403817  0.311352  0.581365  0.942824  0.787317  0.212746  ...  0.049652  0.872466  0.437506  0.727937  0.119991  0.707848  0.178063  0.464412  0.587901

# create the 6x22 dataframes we will append together
# renaming is important so each chunks' columns match up with each other
chunks = [
    df.iloc[:, i:i 22].rename(columns=lambda c: c % 22)
    for i in range(0, 1488, 22)
]
final_df = pd.concat(chunks, ignore_index=True)
final_df
           0         1         2         3         4         5         6         7         8   ...        13        14        15        16        17        18        19        20        21
0    0.202945  0.764556  0.935441  0.811226  0.813502  0.218969  0.612307  0.501421  0.654886  ...  0.683138  0.241730  0.127795  0.290902  0.342813  0.806268  0.739551  0.545052  0.485129
1    0.284549  0.631829  0.562288  0.122613  0.678792  0.494868  0.896530  0.928943  0.740604  ...  0.517114  0.937569  0.028149  0.097362  0.047555  0.755910  0.339539  0.513563  0.861521
2    0.233059  0.349914  0.966794  0.005431  0.051786  0.002843  0.677197  0.557434  0.858027  ...  0.335635  0.256579  0.547100  0.607310  0.925894  0.952812  0.999725  0.687252  0.465104
3    0.862628  0.114289  0.663868  0.929045  0.796207  0.386012  0.097557  0.700127  0.719978  ...  0.670078  0.593592  0.631335  0.917056  0.737024  0.932694  0.547243  0.514497  0.237268
4    0.719133  0.130646  0.438649  0.921081  0.384160  0.393997  0.338588  0.120220  0.115953  ...  0.213295  0.625206  0.570912  0.368144  0.715152  0.024020  0.400959  0.992156  0.328769
..        ...       ...       ...       ...       ...       ...       ...       ...       ...  ...       ...       ...       ...       ...       ...       ...       ...       ...       ...
403  0.662156  0.909833  0.106109  0.630261  0.415084  0.212852  0.947779  0.993973  0.394951  ...  0.748520       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN
404  0.280660  0.324690  0.089441  0.695034  0.040087  0.127492  0.324699  0.793800  0.327186  ...  0.368654       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN
405  0.299956  0.111437  0.332434  0.312539  0.866787  0.535595  0.400371  0.375005  0.509740  ...  0.591004       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN
406  0.801716  0.993745  0.653756  0.415967  0.479453  0.060460  0.297115  0.823037  0.299341  ...  0.745989       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN
407  0.937215  0.811213  0.643623  0.686690  0.843001  0.049652  0.872466  0.437506  0.727937  ...  0.587901       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN

If your dataframe's column names aren't sequential numbers like in this example, you will need to come up with your own mapper so the columns in each chunk match up. Otherwise, the concat operation will create a dataframe with a superset of all of the column names.