My dataframe has 6 rows and 1488 columns (6, 1488) and I need to slice the dataframe such that all slices/ chunks are of a size (6, 22) each.
So I want a since after every 22nd column. Finally, I want to append all these slices one below another - So I get a final dataframe of size - (~405, 22)
Any help will be appreciated.
CodePudding user response:
I'm not sure exactly what your dataframe looks like, but something like this should work.
# create an example dataframe
df = pd.DataFrame(np.random.random((6, 1488)))
df
0 1 2 3 4 5 6 7 8 ... 1479 1480 1481 1482 1483 1484 1485 1486 1487
0 0.202945 0.764556 0.935441 0.811226 0.813502 0.218969 0.612307 0.501421 0.654886 ... 0.849323 0.179219 0.383729 0.453096 0.515090 0.042625 0.157411 0.738439 0.866627
1 0.284549 0.631829 0.562288 0.122613 0.678792 0.494868 0.896530 0.928943 0.740604 ... 0.212852 0.947779 0.993973 0.394951 0.678237 0.590767 0.690921 0.792253 0.748520
2 0.233059 0.349914 0.966794 0.005431 0.051786 0.002843 0.677197 0.557434 0.858027 ... 0.127492 0.324699 0.793800 0.327186 0.619923 0.871256 0.494916 0.487993 0.368654
3 0.862628 0.114289 0.663868 0.929045 0.796207 0.386012 0.097557 0.700127 0.719978 ... 0.535595 0.400371 0.375005 0.509740 0.412794 0.399939 0.414794 0.769017 0.591004
4 0.719133 0.130646 0.438649 0.921081 0.384160 0.393997 0.338588 0.120220 0.115953 ... 0.060460 0.297115 0.823037 0.299341 0.923836 0.111853 0.256940 0.344354 0.745989
5 0.686776 0.711688 0.232884 0.403817 0.311352 0.581365 0.942824 0.787317 0.212746 ... 0.049652 0.872466 0.437506 0.727937 0.119991 0.707848 0.178063 0.464412 0.587901
# create the 6x22 dataframes we will append together
# renaming is important so each chunks' columns match up with each other
chunks = [
df.iloc[:, i:i 22].rename(columns=lambda c: c % 22)
for i in range(0, 1488, 22)
]
final_df = pd.concat(chunks, ignore_index=True)
final_df
0 1 2 3 4 5 6 7 8 ... 13 14 15 16 17 18 19 20 21
0 0.202945 0.764556 0.935441 0.811226 0.813502 0.218969 0.612307 0.501421 0.654886 ... 0.683138 0.241730 0.127795 0.290902 0.342813 0.806268 0.739551 0.545052 0.485129
1 0.284549 0.631829 0.562288 0.122613 0.678792 0.494868 0.896530 0.928943 0.740604 ... 0.517114 0.937569 0.028149 0.097362 0.047555 0.755910 0.339539 0.513563 0.861521
2 0.233059 0.349914 0.966794 0.005431 0.051786 0.002843 0.677197 0.557434 0.858027 ... 0.335635 0.256579 0.547100 0.607310 0.925894 0.952812 0.999725 0.687252 0.465104
3 0.862628 0.114289 0.663868 0.929045 0.796207 0.386012 0.097557 0.700127 0.719978 ... 0.670078 0.593592 0.631335 0.917056 0.737024 0.932694 0.547243 0.514497 0.237268
4 0.719133 0.130646 0.438649 0.921081 0.384160 0.393997 0.338588 0.120220 0.115953 ... 0.213295 0.625206 0.570912 0.368144 0.715152 0.024020 0.400959 0.992156 0.328769
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
403 0.662156 0.909833 0.106109 0.630261 0.415084 0.212852 0.947779 0.993973 0.394951 ... 0.748520 NaN NaN NaN NaN NaN NaN NaN NaN
404 0.280660 0.324690 0.089441 0.695034 0.040087 0.127492 0.324699 0.793800 0.327186 ... 0.368654 NaN NaN NaN NaN NaN NaN NaN NaN
405 0.299956 0.111437 0.332434 0.312539 0.866787 0.535595 0.400371 0.375005 0.509740 ... 0.591004 NaN NaN NaN NaN NaN NaN NaN NaN
406 0.801716 0.993745 0.653756 0.415967 0.479453 0.060460 0.297115 0.823037 0.299341 ... 0.745989 NaN NaN NaN NaN NaN NaN NaN NaN
407 0.937215 0.811213 0.643623 0.686690 0.843001 0.049652 0.872466 0.437506 0.727937 ... 0.587901 NaN NaN NaN NaN NaN NaN NaN NaN
If your dataframe's column names aren't sequential numbers like in this example, you will need to come up with your own mapper so the columns in each chunk match up. Otherwise, the concat
operation will create a dataframe with a superset of all of the column names.