Home > Net >  Changing Headers in .csv files
Changing Headers in .csv files

Time:07-28

Right now I am trying to read in data which is provided in a messy to read-in format. Here is an example

#MOTOR_FIRMWAREVERSION
#SOFTWARE_VERSIOn
#[DATA]
1,2
3,4
5,6
#[END_OF_FILE]

When working with one or two of these files, I have manually changed the ['DATA'] header to ['x', 'y'] and am able to read in data just fine by skipping the first few rows and not reading the last line.

However, right now I have 30 files, split between two different folders and I am trying to figure out the best way to read in the files and change the header of each file from ['DATA'] to ['x', 'y']

Here is what I have right now:

#sets - refers to the set containing the name of each file (i.e. [file1, file2])
#df - the dataframe which you are going to store the data in 
#dataLabels - the headers you want to search for within the .csv file
#skip - the number of rows you want to skip
#newHeader - what you want to change the column headers to be
#pathName - provide path where files are located

def reader (sets, df, dataLabels, skip, newHeader, pathName):
     for i in range(len(sets)):
        
        df_temp = pd.read_csv(glob.glob(pathName  sets[i] ".csv"), sep=r'\s*,', skiprows = skip, engine = 'python')[:-1] 
        df_temp.column.value[0] = [newHeader]
        for j in range(len(dataLabels)):
           df_temp[dataLabels[j]] = pd.to_numeric(df_temp[dataLabels[j]],errors = 'coerce')       
        df.append(df_temp)       
     return df

When I run my code, I run into the error:

No columns to parse from file

I am not quite sure why - I have tried skipping past the [DATA] header and I still receive that error.

Note, for this example I would like the headers to be 'x', 'y' - I am trying to make a universal function so that I could change it to something more useful depending on what I am measuring.

CodePudding user response:

If the #[DATA] row is to be replaced regardless, just ignore it. You can just tell pandas to ignore lines that start with # and then specify your own names:

import pandas as pd

df = pd.read_csv('test.csv', comment='#', names=['x', 'y'])

which gives

   x  y
0  1  2
1  3  4
2  5  6

CodePudding user response:

Expanding Kraigolas's answer, to do this with multiple files you can use a list comprehension:

files = [glob.glob(f"{pathName}{set_num}.csv") for set_num in sets]
df = pd.concat([pd.read_csv(file, comment="#", names = ["x", "y"]) for file in files])
  • Related