Home > database >  Python/Pandas - Remove the first row with Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unn
Python/Pandas - Remove the first row with Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unn

Time:02-18

I am splitting a xlsm file ( with multiple sheets) into a csv with each sheet as a separate csv file. In the process I am removing first 4 lines and keeping the header only which start from row 5. Here is the input file

enter image description here

This is the output I expect -

enter image description here

But I am getting the below output -

enter image description here

Here is the code I am using -

import pandas as pd
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
    df = pd.read_excel(xl,sheet_name=sheet)
    df1 = df.iloc[3:]
    df1.to_csv(f"{sheet}.csv",index=False)

How can I remove the first row which is having values from unnamed: 0 to unnamed: 9?

Also how can I have comment1 show up in both (First and second row) and comment2 in (Third and Fourth row) in last column?

CodePudding user response:

You need to use the skiprows argument inside the pd.read_excel function to correctly get the column names in the 5th row.

UPDATE Including the forward filling

import pandas as pd

xl = pd.ExcelFile('Sample_File.xlsm')

for sheet in xl.sheet_names:
    df = pd.read_excel(xl, sheet_name=sheet, skiprows=4) # no more iloc here
    df['Comment'] = df´['Comment'].ffill()
    df.to_csv(f'{sheet}.csv', index=False)
  • Related