Extracting column names for each csv file and append it into 1 csv-CodePudding

I am trying to only extract the column names for each CSV file and then append into one CSV showing only the column name and filenames. Below is the code I am using:

import pandas as pd
import os
import glob2

# use glob to get all the csv files
# in the folder
path = r'G:\serv_agr\Alberta_Billing\MM1077-NewHorizonMall\Meters\Gas_Wtr'
csv_files = glob2.glob(os.path.join(path, "*.csv"))

# loop over the list of csv files
for f in csv_files:

    # read the csv file
    df = pd.read_csv(f)

    # print the location and filename
    file_name = f.split("\\")[-1]
    print('File Name Original:', file_name)

    # print the content
    print('Columns:', df.columns)

    columns = df.columns

This helps print each column name but import it into one csv.

This is what the output should look like:

Column 1	Column 2	File Name
Date	Time	CSV1
Dates	Time2	csv2

CodePudding user response：

I think that instead of reading the whole file it's better to read only the first line:

import pandas as pd
import csv
from glob import glob
from os import path

data_folder = "."
headers = []
files = []

for f in glob(path.join(data_folder, "*.csv")):
    files.append(path.basename(f))
    with open(f) as f:
        headers.append(next(csv.reader(f)))

df = (
    pd.DataFrame(headers)
    .rename(columns=lambda x: f'Column {x   1}')
    .assign(**{'File name': files})
)

The line pd.DataFrame(headers) works even if headers is a jagged list (i.e. files have different number of headers). In case if there are some empty lines in front of headers, we have to rewrite the for-loop to skip them:

    ...
    with open(f) as f:
        r = csv.reader(f)
        while not (line:=next(r)):
            pass   # empty lines result in empty lists
        headers.append(line)
    ...

CodePudding user response：

Kindly check this documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

It does what you want.