Home > OS >  blank spaces in dataframe
blank spaces in dataframe

Time:01-01

I need some help figuring out the best method for adding a blank space between each table from a Pandas data frame when converting to CSV inside a range-based for loop

# Read a PDF File
df = tabula.read_pdf("out.pdf", pages='all')
with open('out.csv', 'a') as f:
    for x in df:
        x.to_csv('out.csv', mode ='a', sep=',', index=False)
        f.write('\n')

The intended output would be as follows...

1,12,22,33,43,54,64,75,84,95
2,13,23,34,44,55,65,76,85,96
3,14,24,35,45,56,66,77,86,97
4,15,25,36,46,57,67,78,87,98
5,16,26,37,47,58,68,79,88,99
6,17,27,38,48,59,69,80,89,100
7,18,28,39,49,60,70,,90,
8,,29,,50,,71,,91,
9,19,30,40,51,61,72,81,92,101
10,20,31,41,52,62,73,82,93,102
11,21,32,42,53,63,74,83,94,103
1P,2P,3P,4P,5P,6P,7P,8P,9P,10P

104,115,124,135,144,155,165,176,186,197
105,116,125,136,145,156,166,177,187,198
106,117,126,137,146,157,167,178,188,199
107,118,127,138,147,158,168,179,189,200
108,119,128,139,148,159,169,180,190,201
109,120,129,140,149,160,170,181,191,202
110,,130,,150,161,171,182,192,203
111,,131,,151,,172,,193,
112,121,132,141,152,162,173,183,194,204
113,122,133,142,153,163,174,184,195,205
114,123,134,143,154,164,175,185,196,206
11P,12P,13P,14P,15P,16P,17P,18P,19P,20P

However, instead, two new line chars are appended to the end of the file.

I may have a fundamental misunderstanding as to how the append mode works for the to_csv function and would appreciate clarification on why the lines are being added to the end of the file, instead of inline where they are wanted.

A code-based alternative is also appreciated.

Thank you!

CodePudding user response:

This will work for you:

OUTPUT_FILE = 'out.csv'
for i, x in enumerate(df):
    x.to_csv(
        OUTPUT_FILE,
        mode='w' if i == 0 else 'a',
        sep=',',
        index=False
    )
    if i < len(df) - 1:
        with open(OUTPUT_FILE, 'a') as f:
            f.write('\n')

I suspect what might be happening in your example is that there are two buffers opened at the same time for the same file. df.to_csv writes to one, f.write to the other and at the end they get flushed to the disk consecutively (first the df.to_csv one and then the one with two new line characters).

  • Related