Ignore the processing errors in pandas data manipulations-CodePudding

I am trying to run a pandas merge on multiple files and trying to ignore the file not found error, but there is no output after running the code.

import pandas as pd
try:
    df_t = pd.read_csv('C:\\...\\aa_kk.CSV',dtype=str)
    df_u = pd.read_csv('C:\\....\\bb_jj.CSV',dtype=str)
    df_t_e = pd.DataFrame(df_t,columns=['CO','MD','PS','PE','PO'])
    df_u_e = pd.DataFrame(df_u,columns=['CO','MD','PS','PE','PO'])
    merge_tu = [df_t_e,df_u_e]
    result_tu = pd.concat(merge_tu)
    print(result_tu)
except Exception:
    print('not found')

I am expecting the data from df_u to be printed since there is no file exist for df_t. But, nothing is printed after the execution except the "not found".

CodePudding user response：

This is not how we merge dataframes in python.

The correct way is using DataFrame.merge(). Here's an example:

import pandas as pd

df1 = pd.DataFrame({'df1name': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})

df2 = pd.DataFrame({'df2name': ['foo', 'bar', 'baz', 'foo'], 'value': [5, 6, 7, 8]})

df1.merge(df2, left_on='df1name', right_on='df2name')

BUT in your case, the pd.read_csv is inside the try and because the file not exists, an exception occurs and the message "not found" is printed.

CodePudding user response：

From a list of files, create a list of DataFrames. When there's an error, it simply won't be added to the list. At the end, concat them together.

files = ['C:\\...\\aa_kk.CSV', 'C:\\....\\bb_jj.CSV']
dfs = []
for file in files:
    try:
        df = pd.read_csv(file, dtype=str, names=['CO','MD','PS','PE','PO'])
        dfs.append(df)
    except FileNotFoundError:
        print(f'{file} not found')

df = pd.concat(dfs)