I am trying to read multiple files from a folder with specific name (1.car.csv, 2.car.csv and so on) and trying to add a new label after each iteration at right most of the dataset and merge all the csv files into one csv file. As the ".car.csv" is constant, I think I can use a for loop with .format(index) function to run over the csv files. All of the csv files has got same attributes.
Kindly help me!
CodePudding user response:
This can be easily done with a CSV tool like miller:
mlr --csv cat --filename bla1.csv *.car.csv
This will concatenate the files (without repeating the header) and prepend the filename as the first column.
CodePudding user response:
You can use the pandas
library this way:
import pandas as pd
import os
# path to folder where the csv files are stored
path = '/path/to/folder'
result = pd.DataFrame()
for i in range(1, n 1):
filename = "{}.car.csv".format(i)
file_path = os.path.join(path, filename)
df = pd.read_csv(file_path)
df['new_label'] = i
result = pd.concat([result, df], ignore_index=True)
result.to_csv('final_result.csv', index=False)
The n
in the code above should be replaced with the number of csv files you have in the folder.
If you need any explanation of the code (in case you're new to python or dataframes) just comment below.
CodePudding user response:
glob
is used to get all files in the folder that match the pattern*.csv
pd.read_csv
is used to read each file as a DataFrameindex_col=None
you are telling Pandas to not use any of the columns as the index, and instead to create a default index for the DataFrame.header=0
you are telling Pandas to use the first row of the CSV file as the header row.
pd.concat
is used to merge all the DataFrames into a single DataFramemerged_df
axis=0
means that the concatenation should happen along the rows (vertically)ignore_index=True
the concatenation is performed such that the original indices of the individual DataFrames are discarded, and a new default index is created for the resulting DataFrame.
import glob
import pandas as pd
path = r'<path to folder containing csv files>'
all_files = glob.glob(path "/*.csv")
lst = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
lst.append(df)
merged_df = pd.concat(lst, axis=0, ignore_index=True)
CodePudding user response:
Using pathlib and pandas you can use .assign() to enter the new column and finally .concat() to concatenate all the files into one.
from pathlib import Path
import pandas as pd
input_path = Path("path/to/car/files/").glob("*car.csv")
output_path = "path/to/output"
pd.concat(
(pd.read_csv(x).assign(new_label="new data") for x in input_path), ignore_index=True
).to_csv(f"{output_path}/final.csv", index=False)