I am new to python and I'm using Python 3.9.6. I have about 48 different files all with "Cam_Cantera_IDT_output_800K_*.csv" in the name. For example: Cam_Cantera_IDT_output_800K_7401.csv and Cam_Cantera_IDT_output_800K_8012.csv. All the files have a time column t followed by many columns named X_oh, X_ch2, X_h20 etc. I want to write a code that goes to each of these 48 files and takes only the t and X_ch2 columns and saves them in a new file. Since the time column is the same for all 48 files I just want 1 time column t followed by 48 columns of X_ch2, so 49 columns in total.
My first attempt was using append:
#Using .append, this code runs
import pandas as pd
import glob
require_cols = ['t', 'X_ch2']
all_data = pd.DataFrame()
for f in glob.glob("Cam_Cantera_IDT_output_800K_*.csv"):
df = pd.read_csv(f, usecols = require_cols)
all_data = all_data.append(df,ignore_index=True)
all_data.to_csv("new_combined_file.csv")
This code ran but it appended each file one underneath the other, so I had only 2 columns, one for t and one for X_ch2, but many rows. I later read that that's what append does, it adds the files one underneath the other.
I then tried using pd.concat to add the columns vertically next to each other. The code I used can be seen below. I unfortunately got the same result as with append. I got just 2 columns, one for time t and one for X_ch2 and all the files were added underneath each other.
#Attempting using pd.concat, this code runs
import glob
import pandas as pd
file_extension = 'Cam_Cantera_IDT_output_800K_*.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
require_cols = ['t', 'X_ch2']
combined_csv_data = pd.concat([pd.read_csv(f, usecols = require_cols) for f in all_filenames], axis=0)
print(combined_csv_data)
combined_csv_data.to_csv('combined_csv_data.csv')
My last attempt was using pd.merge, I added on='t' for it to merge on the time column so that I only have 1 time column. I keep getting the error that I am missing 'right'. But when I add it to the line it tells me that right is not defined:
combined_csv_data = pd.merge(right, [pd.read_csv(f, usecols = require_cols) for f in all_filenames], on='t') => gives 'right' is not defined.
I tried using right_on = X_ch2 or right_index=True but nothing seems to work.
The original code I tried, without any 'right' is shown below.
# A merge attempt
#TypeError: merge() missing 1 required positional argument: 'right'
import glob
import pandas as pd
file_extension = 'Cam_Cantera_IDT_output_800K_*.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
require_cols = ['t', 'X_ch2']
combined_csv_data = pd.merge([pd.read_csv(f, usecols = require_cols) for f in all_filenames], on='t')
print(combined_csv_data)
combined_csv_data.to_csv('combined_csv_data.csv')
Any help would be highly appreciated, thank you.
CodePudding user response:
Use axis=1
in concat
and convert t
column to index
, select column X_ch2
for Series
:
require_cols = ['t', 'X_ch2']
L = [pd.read_csv(f,usecols = require_cols, index_col=['t'])['X_ch2'] for f in all_filenames]
combined_csv_data = pd.concat(L, axis=1)
If need columns names rename by filenames for avoid duplicated 48 columns names:
import os
L = [pd.read_csv(f,
usecols = require_cols,
index_col=['t'])['X_ch2'].rename(os.path.basename(f))
for f in all_filenames]
CodePudding user response:
If installing convtools library is an option, then:
import glob
from convtools import conversion as c
from convtools.contrib.tables import Table
required_cols = ["t", "X_ch2"]
table = None
for number, f in enumerate(glob.glob("Cam_Cantera_IDT_output_800K_*.csv")):
table_ = (
Table.from_csv(f, header=True)
.take(*required_cols)
.rename({"X_ch2": f"X_ch2__{number}"})
)
if table is None:
table = table_
else:
table = table.join(table_, on="t", how="full")
table.into_csv("new_combined_file.csv", include_header=True)