Home > Net >  Pandas adding in a second decimal when reading in repeated float values in a single row
Pandas adding in a second decimal when reading in repeated float values in a single row

Time:12-05

I'm trying to combine several csv files into one large one, 400 in total. The files contain varying amounts of rows but the same amount of columns. I can get them into one single file together but some repeated values in a row are adding a second decimal place that I don't want.

The input data might contain:
..., 0.000, 0.000, 0.000, ...
And after reading it in and writing to the csv it becomes:
..., 0.000, 0.000.1, 0.000.2, ...

It also seems to do this at random as some of the rows are completely okay.

import pandas as pd
import glob
import os

path = "trainingData"
csv_files = glob.glob(os.path.join(path, "*.csv"))

counter = 0

for f in csv_files:
    print("File: {} of 402".format(counter))
    counter =1

    df = pd.read_csv(f, lineterminator='\n', delimiter=', ')
    print(df.head(5))
    df.to_csv("allData.csv", index=False, na_rep="0", mode='a', lineterminator='\r\n', float_format='%.3f')

example input row:

1, 5000.0, 0.0, 0.000, 0.000, 0.000, 180.356, -67.467, -167.262, 0.000, 0.000, 0.000, 1068.000, 5000.0, 509.523, -1290.843, -405.013, 0.000, 0.000, 0.000, 0.000, 0

Whats written back out to the file:

1, 5000.0, 0.0, 0.000, 0.000.1, 0.000.2, 180.356, -67.467, -167.262, 0.000.3, 0.000.4, 0.000.5, 1068.000, 5000.0.1, 509.523, -1290.843, -405.013, 0.000.6, 0.000.7, 0.000.8, 0.000.9, 0

CodePudding user response:

import pandas as pd
import glob
import os

path = "trainingData"
csv_files = glob.glob(os.path.join(path, "*.csv"))

counter = 0

for f in csv_files:
    print("File: {} of 402".format(counter))
    counter =1

df = pd.read_csv(f, lineterminator='\n', delimiter=', ')
print(df.head(5))
# Remove the mode='a' flag
df.to_csv("allData.csv", index=False, na_rep="0", lineterminator='\r\n', float_format='%.3f')

CodePudding user response:

After trying a bunch of different things I finally found that it was reading those rows in as headers. So, pandas was trying to fix the columns having the same name by appending the .X.

import pandas as pd
import glob
import os

path = "trainingData"
csv_files = glob.glob(os.path.join(path, "*.csv"))

counter = 0

for f in csv_files:
    print("File: {} of 402".format(counter))
    counter =1

    # Specify there is no header Here 
    df = pd.read_csv(f, lineterminator='\n', delimiter=', ', header=None)
    print(df.head(5))
    df.to_csv("allData.csv", index=False, na_rep="0", mode='a', lineterminator='\r\n', float_format='%.3f')
  • Related