For a project I am importing a csv-file(s) into SQLite with the aid of pandas dataframe.
It 's a very big file (914 columns) so I want to split this by selecting the columns. I do this with pandas df.
This works fine, but not when there are no values in the column, then I get an index-error. I don't know in advance if a column will be empty.
def limit_rubric1(df):
limit_df = df[[
"Company",
"RUB 1",
"RUB 2",
"RUB 3",
"FileBase"]].fillna(value=0)
# limit_df = limit_df.reset_index(drop=True)
return limit_df
This is the error I get: File "C:\Users\xxx\PycharmProjects\MF\venv\lib\site-packages\pandas\core\indexes\base.py", line 6176, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['RUB 2'] not in index"
CodePudding user response:
It depends what needs - if not exist RUB 2
and need this column in ouput with 0
use:
def limit_rubric1(df):
return df.reindex(columns=[
"Company",
"RUB 1",
"RUB 2",
"RUB 3",
"FileBase"], fill_value=0)
Or if need only existing columns:
def limit_rubric1(df):
cols=df.columns.intersection(["Company","RUB 1","RUB 2","RUB 3","FileBase"],sort=False)
return df[cols].fillna(value=0).reset_index(drop=True)