Home > OS >  Applying if conditions in a for loop for Pandas dataframes with different amount of columns
Applying if conditions in a for loop for Pandas dataframes with different amount of columns

Time:08-26

I have 3 different types of Pandas dataframes, one type includes 3 columns ('R','B','I'), another includes just the first two ('R','B') and the other one just 'R'.

I need to edit the values in all these columns with variables ('b,'r','i') unique to each dataframe and column, so I've a set up a for loop that includes an if condition so Pandas can read them without giving an error:

if 'B' and 'I' not in df.columns:
    df['R'] = df['R'] - r

if 'B' and 'R' and not 'I' in df.columns:
    df['B'] = df['B'] - b
    df['R'] = df['R'] - r

else:
    df['B'] = df['B'] - b
    df['R'] = df['R'] - r
    df['I'] = df['I'] - i

The code seems to run fine until it encounters a dataframe with just only the 'R' column. It gives the following error:

KeyError                                  Traceback (most recent call last)
Input In [10], in <cell line: 26>()
     91 if 'B' and 'R' and not 'I' in df.columns:
---> 92     df['B'] = df['B'] - b
     93     df['R'] = df['R'] - r

File ~/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
   3503 if self.columns.nlevels > 1:
   3504     return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
   3506 if is_integer(indexer):
   3507     indexer = [indexer]

File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance)
   3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:
-> 3623     raise KeyError(key) from err
   3624 except TypeError:
   3625     # If we have a listlike key, _check_indexing_error will raise
   3626     #  InvalidIndexError. Otherwise we fall through and re-raise
   3627     #  the TypeError.
   3628     self._check_indexing_error(key)

KeyError: 'B'

Why does the dataframe with just the 'R' column skip the first condition (the one it's meant to use)? Is this the simplest way to do what I need?

CodePudding user response:

Well, I can see two things that went wrong here.
1 - The second if should be elif, otherwise you are starting a new condition and the dataframe with only R will fall into the first if as well as the second if.
2- When you ask if 'B' and 'I' not in df.columns you basically asking if True and 'I' not in df.columns. You need to rephrase the condition as so: if 'B' not in df.columns and 'I' not in df.columns (same goes for the second condition you wrote).

Below is a working example, have fun :)

import pandas as pd

df_bir = pd.DataFrame({
    'B': [1, 2, 3],
    'I': [4, 5, 6],
    'R': [7, 8, 9]
})

df_br = pd.DataFrame({
    'B': [1, 2, 3],
    'R': [7, 8, 9]
})

df_r = pd.DataFrame({
    'R': [7, 8, 9]
})

b = 1
i = 2
r = 3

for df in [df_bir, df_br, df_r]:
    if 'B' not in df.columns and 'I' not in df.columns:
        df['R'] = df['R'] - r

    elif 'B' in df.columns and 'R' in df.columns and not 'I' in df.columns:
        df['B'] = df['B'] - b
        df['R'] = df['R'] - r

    else:
        df['B'] = df['B'] - b
        df['R'] = df['R'] - r
        df['I'] = df['I'] - i
  • Related