Python Pandas split CSV file by rows with headers/columns in every file-CodePudding

Im pretty new to python but already having some success.

There is just a small detail missing i cannot figure out to get working.

As the title says, Im splitting huge CSV files with weather data (almost millions rows). The splitting works well, but the columns/header is in the first file only.

The data looks like that:

year;month;day;date;Id;Po;P;;T;st;sn;Tx;sn;Tn;e;R1;Rd;nr;S1;ps;mp;mT;mTx;mTn;me;mR;mS
2003;3;1;01.03.2003;1001;10047;10059;1;27;46;0;1;1;52;45;56;3;13;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1008;9995;10031;1;173;45;1;142;1;211;13;18;3;7;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1025;10058;10068;1;2;27;0;22;1;25;50;182;6;21;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1026;9924;10067;1;6;26;0;18;1;28;49;183;6;22;53;47;0;0;0;0;0;0;0
2003;3;1;01.03.2003;1028;9991;10011;1;84;57;1;36;1;128;33;47;5;15;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1098;10006;10024;1;18;29;0;10;1;46;43;58;5;15;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1152;10092;10108;0;18;26;0;42;1;2;57;110;5;21;60;53;0;0;0;0;0;0;0
2003;3;1;01.03.2003;1212;10148;10166;0;53;13;0;69;0;38;71;;;;;;0;0;0;0;0;;
2003;3;1;01.03.2003;1238;9030;10192;1;29;42;0;6;1;58;37;5;1;2;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1241;10148;10159;0;44;24;0;68;0;23;65;55;3;12;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1271;10143;10167;0;33;29;0;65;0;2;59;39;3;9;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1317;10152;10197;0;48;13;0;80;0;21;72;95;2;12;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1384;9955;10208;0;3;37;0;52;1;35;50;21;2;4;;;0;0;0;0;0;0;
2003;3;1;01.03.2003;1389;;;1;6;39;0;57;1;55;50;18;2;3;;;;0;0;0;0;0;
.
.
.
.(the dots are just implying that its more data below ;) )

Id like to keep the columns in every CSV file written. Not just in the first file created.

The code so far (with some tkinter field):

def splitFiles():
    file_path = str(CVS_file_source.get())
    new_filename = str(new_filename_entry.get())
    path_destination = str(folder_path_destination.get())
    file_destination = os.path.join(path_destination, new_filename)
    dattype = field_dattype.get()

    #csv file name to be read in
    in_csv = file_path

    #get the number of lines of the csv file to be read
    number_lines = sum(1 for row in (open(in_csv)))

    #size of rows of data to write to the csv,
    #you can change the row size according to your need
    rowsize = rows.get()


    #start looping through data writing it to a new file for each set
    for i in range(0,number_lines,rowsize):

        df = pd.read_csv(in_csv,
              nrows = rowsize,  #number of rows to read at each loop
              skiprows = i)     #skip rows that have been read



        #csv to write data to a new file with indexed name. input_1.csv etc.
        out_csv = file_destination   '_'   str(i)   dattype

        df.to_csv(out_csv,
              index=False,
              header=True,
              mode='a',             #append data to csv file
              chunksize=rowsize)    #size of data to append for each loop

(I used the code form that question: How to split csv file keeping its header in each smaller files in Python? )

What is the code missing? It does not work for me as suggested in the question.

Any help will be great!

CodePudding user response：

If you want to save the file in a split format, then you don't really need a huge function. Let's say you'd like to save every 100k rows:

for i in range(round(len(df)/10**5) 1):
   df.iloc[i*10**5:(i 1)*10**5,:].to_csv('path_to_save_file_' str(i*10**5) "_" str(i*10**5) '.csv')
print("Saving file with rows from: ",i*10**5,"to",(i 1)*10**5)

Or you can do it in a single with list comprehensions:

[df.iloc[i*10**5:(i 1)*10**5,:].to_csv('path_to_save_file_' str(i*10**5) "_" str(i*10**5) '.csv') for i in range(round(len(df)/10**5) 1)]

This will essentially write a csv with rows 0 to 100000, from 100000 to 200000 and so on, while the numbers in the name so you can easily identify them. Returning:

Saving file with rows from:  0 100000
Saving file with rows from:  100000 200000
Saving file with rows from:  200000 300000
Saving file with rows from:  300000 400000
Saving file with rows from:  400000 500000
Saving file with rows from:  500000 600000
Saving file with rows from:  600000 700000
Saving file with rows from:  700000 800000
Saving file with rows from:  800000 900000
Saving file with rows from:  900000 1000000
Saving file with rows from:  1000000 1100000