The files of both csv and xlsx contain same context, with same header and all. But would like to combine all under one file and then having another column to identify which is csv, which is xlsx. How do I go about doing so?
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}.format(extension))]
combined)csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined)csv.to_csv("combined_csv.csv", index= False, encoding= 'utf-8-sig')
CodePudding user response:
To merge CSV and XLSX files and save them into a new combined file using the code you provided, you can use the pandas library in Python to read the CSV and XLSX files, concatenate them into a single DataFrame, and then write the resulting DataFrame to a new CSV file. Here is an example of how you could modify your code to do this:
import glob
import pandas as pd
# Set the file extension
extension = 'csv'
# Get the list of filenames with the specified extension
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
# Read the CSV and XLSX files using pandas
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_xlsx = pd.read_excel('combined_xlsx.xlsx')
# Concatenate the CSV and XLSX data into a single DataFrame
combined = pd.concat([combined_csv, combined_xlsx])
# Write the combined DataFrame to a new CSV file
combined.to_csv("combined_csv.csv", index=False, encoding='utf-8-sig')
In this example, the code uses the pandas library to read the CSV and XLSX files and concatenate them into a single DataFrame. It then writes the resulting DataFrame to a new CSV file using the to_csv() method. This will create a new CSV file that contains the combined data from the original CSV and XLSX files.
CodePudding user response:
In addition to the answer by aHelpfucoder,
Use the below queries just before you concatenate the combined_csv & combined_xlsx dataframes to create a new column that can tell you whether a row from a csv file or from an xlsx file.
combined_csv['file_type'] = 'CSV'
combined_xlsx['file_type] = 'XLSX'
Next you can concatenate these dataframes,
combined = pd.concat([combined_csv, combined_xlsx])