I have 3 different types of Pandas dataframes, one type includes 3 columns ('R','B','I')
, another includes just the first two ('R','B')
and the other one just 'R'
.
I need to edit the values in all these columns with variables ('b,'r','i')
unique to each dataframe and column, so I've a set up a for
loop that includes an if
condition so Pandas can read them without giving an error:
if 'B' and 'I' not in df.columns:
df['R'] = df['R'] - r
if 'B' and 'R' and not 'I' in df.columns:
df['B'] = df['B'] - b
df['R'] = df['R'] - r
else:
df['B'] = df['B'] - b
df['R'] = df['R'] - r
df['I'] = df['I'] - i
The code seems to run fine until it encounters a dataframe with just only the 'R'
column. It gives the following error:
KeyError Traceback (most recent call last)
Input In [10], in <cell line: 26>()
91 if 'B' and 'R' and not 'I' in df.columns:
---> 92 df['B'] = df['B'] - b
93 df['R'] = df['R'] - r
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 'B'
Why does the dataframe with just the 'R'
column skip the first condition (the one it's meant to use)? Is this the simplest way to do what I need?
CodePudding user response:
Well, I can see two things that went wrong here.
1 - The second if
should be elif
, otherwise you are starting a new condition and the dataframe with only R
will fall into the first if
as well as the second if
.
2- When you ask if 'B' and 'I' not in df.columns
you basically asking if True and 'I' not in df.columns
. You need to rephrase the condition as so: if 'B' not in df.columns and 'I' not in df.columns
(same goes for the second condition you wrote).
Below is a working example, have fun :)
import pandas as pd
df_bir = pd.DataFrame({
'B': [1, 2, 3],
'I': [4, 5, 6],
'R': [7, 8, 9]
})
df_br = pd.DataFrame({
'B': [1, 2, 3],
'R': [7, 8, 9]
})
df_r = pd.DataFrame({
'R': [7, 8, 9]
})
b = 1
i = 2
r = 3
for df in [df_bir, df_br, df_r]:
if 'B' not in df.columns and 'I' not in df.columns:
df['R'] = df['R'] - r
elif 'B' in df.columns and 'R' in df.columns and not 'I' in df.columns:
df['B'] = df['B'] - b
df['R'] = df['R'] - r
else:
df['B'] = df['B'] - b
df['R'] = df['R'] - r
df['I'] = df['I'] - i