I am trying to only extract the column names for each CSV file and then append into one CSV showing only the column name and filenames. Below is the code I am using:
import pandas as pd
import os
import glob2
# use glob to get all the csv files
# in the folder
path = r'G:\serv_agr\Alberta_Billing\MM1077-NewHorizonMall\Meters\Gas_Wtr'
csv_files = glob2.glob(os.path.join(path, "*.csv"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f)
# print the location and filename
file_name = f.split("\\")[-1]
print('File Name Original:', file_name)
# print the content
print('Columns:', df.columns)
columns = df.columns
This helps print each column name but import it into one csv.
This is what the output should look like:
Column 1 | Column 2 | File Name |
---|---|---|
Date | Time | CSV1 |
Dates | Time2 | csv2 |
CodePudding user response:
I think that instead of reading the whole file it's better to read only the first line:
import pandas as pd
import csv
from glob import glob
from os import path
data_folder = "."
headers = []
files = []
for f in glob(path.join(data_folder, "*.csv")):
files.append(path.basename(f))
with open(f) as f:
headers.append(next(csv.reader(f)))
df = (
pd.DataFrame(headers)
.rename(columns=lambda x: f'Column {x 1}')
.assign(**{'File name': files})
)
The line pd.DataFrame(headers)
works even if headers
is a jagged list (i.e. files have different number of headers). In case if there are some empty lines in front of headers, we have to rewrite the for-loop to skip them:
...
with open(f) as f:
r = csv.reader(f)
while not (line:=next(r)):
pass # empty lines result in empty lists
headers.append(line)
...
CodePudding user response:
Kindly check this documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
It does what you want.