How to extract many groups of cells separated by a specified number of rows in excel using python an-CodePudding

I have a csv file which has around 58 million cells containing numerical data. I want to extract data from every 16 cells which are 49 rows apart. Let me describe it clearly.

The data I need to extract

The above image shows the the first set of data that is to be extracted (rows 23 to 26, columns 92 to 95). This data has to be written in another file csv file (preferably in a row). Then I will move down 49 rows (row 72), then extract 4rows x 4columns. Shown in image below.

Next set of data

Similarly, I need to keep going till I reach the end of the file.

Third set

The next set will be the image shown above. I have to keep going till I reach the end of the file and extract thousands of such data.

I had written a code for this but its not working. I don't know where is the mistake. I will also attach it here.

import pandas as pd
import numpy
df = pd.read_csv('TS_trace31.csv')
# print(numpy.shape(df))
df = pd.read_csv('TS_trace31.csv')
# print(numpy.shape(df))
arrY = []
ex = 0
for i in range(len(df)):

if i == 0:
    for j in range(4):

        l = (df.iloc[j 21 i*(49), 91:95]).tolist()
        arrY.append(l)

else:
    for j in range(4):
        if j 22 i*(49) >= len(df):
            ex = 1
            break
        # print(j)
        l = (df.iloc[j 21 i*(49), 91:95]).tolist()
        arrY.append(l)

if ex == 1:
    break

# print(arrY)


a = []
for i in range(len(arrY) - 3):
    p = arrY[i] arrY[i 1] arrY[i 2] arrY[i 3]
    a.append(p)
print(numpy.shape(a))
numpy.savetxt('myfile.csv', a, delimiter=',')

Using the above code, I didn't get the result I wanted. Please help with this and correct where I have gone wrong. I couldn't attach my csv file here, Please try to use any sample sheet that you have or can create a simple one.

Thanks in advance! Have a great day.

CodePudding user response：

i don't know what exactly you are doing in your code
but i wrote my own

import csv
from itertools import chain

CSV_PATH = 'TS_trace31.csv'
new_data = []

with open(CSV_PATH, 'r') as csvfile:
    reader = csv.reader(csvfile)

    # row_num for storing big jumps e.g. 23, 72, 121 ...
    row_num = 23
    # n for storing the group number 0 - 3
    # with n we can find the 23, 24, 25, 26
    n = 0
    # row_group for storing every 4 group rows
    row_group = []

    # looping over every row in main file
    for row in reader:
        if reader.line_num == row_num   n:
            # for the first time this is going to be 23   0
            # then we add one number to the n
            # so the next cycle will be 24 and so on
            n  = 1
            print(reader.line_num)

            # add each row to it group
            row_group.append(row[91:95])

            # check if we are at the end of the group e.g. 26
            if n == 4:
                # reset the group number
                n = 0
                # add the jump to main row number
                row_num  = 49

                # combine all the row_group to a single row
                new_data.append(list(chain(*row_group)))

                # clear the row_group for next set of rows
                row_group.clear()
                print('='*50)
        else:
            continue


# and finally write all the rows in a new file
with open('myfile.csv', 'w') as new_csvfile:
    writer = csv.writer(new_csvfile)
    writer.writerows(new_data)