Home > database >  Python error The truth value of a DataFrame is ambiguous
Python error The truth value of a DataFrame is ambiguous

Time:01-10

My Python Pandas DF block is giving me the below error:

The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 58, in lambda_handler
    df, charter, charter_filename = clean_df(read_excel_data, file)
  File "/var/task/lambda_function.py", line 159, in clean_df
    if df[df['program_code'] == 'F23-IPS-SD-ENG']:
  File "/opt/python/pandas/core/generic.py", line 1443, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "

I have the following lines of code:

if df[df['program_code'] == 'name_of_the_special_program']:
    print("Special Load")
else:
    print("Regular Load")

I also tried:

array = ['name_of_the_special_program']
if df.loc[df['program_code'].isin(array)]:
        print("Special Load")
    else:
        print("Regular Load")

I did my research before posting here, which states:

This error occurs because the if statement requires a truth value, i.e., a statement evaluating to True or False. In the above example, the < operator used against a dataframe will return a boolean series, containing a combination of True and False for its values. Since a series is returned, Python doesn't know which value to use, meaning that the series has an ambiguous truth value.

Instead, we can pass this statement into dataframe brackets to get the desired values

That's why I wrote this:

df[df['program_code'] == 'name_of_the_special_program']

Sample data from the DF:

enter image description here

CodePudding user response:

I assume you mean you want a way to check if df.program_code contains a value (in this case 'name_of_the_special_program').

In this case, there are many ways of doing this.

One option is to check if the is in the values of the Series:

if 'name_of_the_special_program' in df.program_code.values:
    print("Special Load")
else:
    print("Normal Load")

You can also use .to_numpy() instead of .values.

Or using the isin(...) and any() methods:

if df.program_code.isin(['name_of_the_special_program']).any():
    print("Special Load")
else:
    print("Normal Load")

From my very rudimentary timing (using timeit) I found that:

.values and .to_numpy() perform similarly (57s vs 53s)

.isin([...]).any() performed significantly slower (230s)

CodePudding user response:

You can't put the condition on a whole dataframe that way, you can do this:

for i in df.iterrows():
    if i[1]['program_code'] == 'name_of_the_special_program':
        print("True")
    else:
        print("False")

df['new'] = np.where(df['program_code'] == 'name_of_the_special_program', 'True', 'False')
print(df)

                  program_code    new
0  name_of_the_special_program   True
1                          abc  False
2                          asd  False

CodePudding user response:

You could use this to check for any row meeting the condition:

   if df[df['program_code'] == 'name_of_the_special_program'].empty:
        print("Regular Load")
    else:
        print("Special Load")
  • Related