How to deal with NaN's when merging columns from different files into one file-CodePudding

I am new to python and using python 3.9.6. I have a code that goes to each file that starts with Cam_Cantera_IDT_output_800K_ and takes the first column t and the column X_ch2 of each file and merges them together using pd.merge. The code does what I want it to do, however in my output I get many NaN's popping up. When I open my csv file I have many empty cells. I need to have all the data in order to perform some calculations later on. Do I need to be looking into options such as combine instead of merge? Any help would be greatly appreciated since I really don't know how to tackle this. Thank you.

import glob
import pandas as pd
import os

file_extension = 'Cam_Cantera_IDT_output_800K_*.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]

require_cols = ['t', 'X_ch2']

L = [pd.read_csv(f, usecols = require_cols, index_col=['t'])['X_ch2'].rename(os.path.basename('X_ch2_' f))for f in all_filenames]

combined_csv_data = pd.concat(L, axis=1)

print(combined_csv_data)

combined_csv_data.to_csv('combined_csv_data.csv')

CodePudding user response：

You can use combined_csv_data.dropna() to remove all rows with missing values before making the .csv file

CodePudding user response：

Try removing the NaN's when concatenating the data:

combined_csv_data = pd.concat(np.reshape([data for data in np.reshape(L, -1) if np.isnan(data) == False], [(np.int(len(np.reshape(L, -1)) - np.sum([nans for nans in np.reshape(L, -1) if np.isnan(nans) == True])) / 2, 2]), axis=1)

or try (without concatenating, just reshaping):

combined_csv_data = np.reshape([data for data in np.reshape(L, -1) if np.isnan(data) == False], [(np.int(len(np.reshape(L, -1)) - np.sum([nans for nans in np.reshape(L, -1) if np.isnan(nans) == True])) / 2, 2])