Home > Software design >  How to check and add missing headers to a csv file
How to check and add missing headers to a csv file

Time:01-18

I have a CSV file that has let's say a total of 6 column headers out of which 1st one A1(Location) is fixed always. However, due to dynamic data, I sometimes get only 3 column headers out of the other 5 headers in the CSV file and those could be any columns out of those 5 columns from B1 to F1. The first screenshot shows all the columns that I want to have, 2nd the screenshot shows what I get.

CSV file with full 5 headers

CSV file with missing 2 out 5 headers

Now what I want to do is, somehow should be able to check the header in the file first and if it is already available then ignore it if it does not then add the missing headers to the next columns in the CSV file using python.

Note: Just the headers are enough for me, I would fill blank rows underneath with 0 or blank to match the table.

Kindly help me!

CodePudding user response:

import pandas as pd

file_path='Yourfile.csv' #file name
df = pd.read_csv(file_path) #reading csv file 
df = df.fillna(0) #replace NaN to 0

CodePudding user response:

Unfortunately, I know no clean and direct way to declare more columns than what exists in a csv file.

But once you have got a Pandas DataFrame, it is easy to add the missing columns:

# read a CSV file having missing columns:
tmp = pd.read_csv('file.csv')

#create an empty dataframe with all the expected columns
df = pd.DataFrame(columns=['Location', 'Total', 'Open', 'Checkin', 'Closed', 'Cancelled'])

# just copy the data:
df[tmp.columns] = tmp

That's all. The missing columns will be filled with NaN values.

CodePudding user response:

import pandas as pd

#Read the first excel file content
df1 = pd.read_excel(r'/content/MainFile.xlsx')

#Read the second excel file content where headers are missing
df2 = pd.read_excel(r'/content/ClientFile.xlsx')

Now look up for different columns between two files and extract them on a new data frame:

extracted_col= df1[df1.columns.difference(df2.columns)]

And then join the extracted columns to your missing headers file.

df2=df2.join(extracted_col)
  • Related