Python pandas df index error when column empty-CodePudding

For a project I am importing a csv-file(s) into SQLite with the aid of pandas dataframe.

It 's a very big file (914 columns) so I want to split this by selecting the columns. I do this with pandas df.

This works fine, but not when there are no values in the column, then I get an index-error. I don't know in advance if a column will be empty.

def limit_rubric1(df):
    limit_df = df[[
        "Company",
        "RUB 1",
        "RUB 2",
        "RUB 3",
        "FileBase"]].fillna(value=0)
    # limit_df = limit_df.reset_index(drop=True)
    return limit_df

This is the error I get: File "C:\Users\xxx\PycharmProjects\MF\venv\lib\site-packages\pandas\core\indexes\base.py", line 6176, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['RUB 2'] not in index"

CodePudding user response：

It depends what needs - if not exist RUB 2 and need this column in ouput with 0 use:

def limit_rubric1(df):
    return df.reindex(columns=[
        "Company",
        "RUB 1",
        "RUB 2",
        "RUB 3",
        "FileBase"], fill_value=0)

Or if need only existing columns:

def limit_rubric1(df):
    
    cols=df.columns.intersection(["Company","RUB 1","RUB 2","RUB 3","FileBase"],sort=False)
    return df[cols].fillna(value=0).reset_index(drop=True)