Home > database >  How do I make a new column every 4 elements in a row excel file?
How do I make a new column every 4 elements in a row excel file?

Time:06-16

I am trying to take many rows of excel files and parse them. I have multiple rows of 60 numbers. I would like to make a new stack or row every 4 numbers for every single row. I know this is confusing, an example would be:

input from excel 0.282836353 0.287834898 0.284861334 0.284880417 0.242768564 0.241888021    0.240441747 0.244627766 0.245007869 0.244790578 0.243485108 0.244743911 0.267488959 0.264294548 0.267708093 0.268158158

output: 0.282836353 0.287834898 0.284861334 0.284880417 
        0.242768564 0.241888021 0.240441747 0.244627766 
        0.245007869 0.244790578 0.243485108 0.244743911
        0.267488959 0.264294548 0.267708093 0.268158158

This is what I currently have tried:

import openpyxl
excelFile = openpyxl.load_workbook("nameoffile.xlsx")
sheet = excelFile["Sheet1"]
nRows = sheet.max_row
nCols = sheet.max_column
r = 2
groups = 4 # how many to be grouped together
parsedList = []

while r <= nRows:
    rList,rListSum = [],[]
    x = 0
    c = 2
    while c <= nCols:
        if x < groups:
            cell = sheet.cell(row=r,column=c)
            rList.append(cell.value)
        x  = 1
        if x == groups:
            rListSum.append(rList)  
    c  = 1
    parsedList.append(rListSum)
r  = 1

my current output is nothing. something seems to be wrong with my loop.

CodePudding user response:

The phrase "multiple rows of 60 numbers" in the question is not clear.
Does each row have a string of 60 numbers (e.g. "0.1 0.2 0.3 ...") or 60 columns (e.g. A1, B1, ...)?

I assumed the latter one and you can simply do what you want by using NumPy and Pandas.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(5,60))
0   1   2   3   4   5   6   7   8   9   ... 50  51  52  53  54  55  56  57  58  59
0   0.609861    0.778792    0.223185    0.945089    0.630081    0.629745    0.893291    0.557107    0.044485    0.267008    ... 0.423123    0.236365    0.216665    0.173979    0.436419    0.206685    0.188097    0.042587    0.316459    0.252943
1   0.451599    0.234676    0.540550    0.576964    0.067314    0.180313    0.952212    0.725722    0.368086    0.351682    ... 0.209218    0.284543    0.504832    0.306373    0.830000    0.878597    0.549021    0.430474    0.162942    0.194625
2   0.346460    0.803657    0.270074    0.375594    0.080673    0.051869    0.119898    0.632888    0.497251    0.689102    ... 0.586911    0.725564    0.662852    0.814569    0.753149    0.363969    0.811467    0.144431    0.523725    0.725145
3   0.846687    0.488381    0.803212    0.405574    0.137206    0.963724    0.200077    0.576446    0.449057    0.125634    ... 0.295565    0.153730    0.259037    0.725119    0.427521    0.750190    0.222516    0.960042    0.786858    0.446018
4   0.563980    0.809230    0.420195    0.776374    0.506152    0.429394    0.802265    0.261096    0.924650    0.778620    ... 0.282234    0.333429    0.714918    0.404162    0.730436    0.939195    0.657210    0.368771    0.096418    0.189417
df_new = pd.DataFrame(df.to_numpy().reshape(-1, 4))
    0   1   2   3
0   0.609861    0.778792    0.223185    0.945089
1   0.630081    0.629745    0.893291    0.557107
2   0.044485    0.267008    0.433519    0.093443
3   0.057903    0.786750    0.474591    0.579831
4   0.491986    0.686017    0.930706    0.144237
... ... ... ... ...
70  0.958695    0.789661    0.835077    0.146602
71  0.936193    0.436480    0.558998    0.770272
72  0.411825    0.388468    0.282234    0.333429
73  0.714918    0.404162    0.730436    0.939195
74  0.657210    0.368771    0.096418    0.189417

CodePudding user response:

Below code has the same output as @J.Choi

pd.DataFrame(np.array(df).reshape(-1, 4))
  • Related