"The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().-CodePudding

I have two dataframe df1 and subdf.

df1 looks something like this:

from pandas import Timestamp, Timedelta
df = pd.DataFrame({'station_key': {1300234: 'CV000011', 1300235: 'CV000011'},'charger_key': {1300234: 'CV00001101', 1300235: 'CV00001101'},'cid': {1300234: '01', 1300235: '01'},'x': {1300234: '33.489125', 1300235: '33.489125'},'y': {1300234: '126.487631', 1300235: '126.487631'},'snm': {1300234: '에코티엘(제주)', 1300235: '에코티엘(제주)'},'addr': {1300234: '제주특별자치도 제주시 연동 251-69외 ',1300235: '제주특별자치도 제주시 연동 251-69외 '},'addr_jibun': {1300234: '-', 1300235: '-'},'started_at': {1300234: Timestamp('2020-11-03 20:56:31'),1300235: Timestamp('2020-11-03 23:10:12')},'ended_at': {1300234: Timestamp('2020-11-03 23:10:12'),1300235: Timestamp('2020-11-03 23:40:12')},'status': {1300234: '2', 1300235: '1'},'day': {1300234: 'Tuesday', 1300235: 'Tuesday'},'time_usage': {1300234: Timedelta('0 days 02:13:41'),1300235: Timedelta('0 days 00:30:00')},'start': {1300234: Timestamp('2020-11-03 00:00:00'),1300235: Timestamp('2020-11-03 00:00:00')},'end': {1300234: Timestamp('2020-11-03 00:00:00'),1300235: Timestamp('2020-11-03 00:00:00')},'start_hour': {1300234: 20, 1300235: 23},'end_hour': {1300234: 23, 1300235: 23},'start_minute': {1300234: 56, 1300235: 10},'end_minute': {1300234: 10, 1300235: 40}})

And the subdf looks something like this:

subdf = pd.DataFrame({'start': {1300234: Timestamp('2020-11-03 00:00:00'),4849001: Timestamp('2020-11-03 00:00:00')},'station_key': {1300234: 'CV000011', 4849001: 'CV000271'},'charger_key': {1300234: 'CV00001101', 4849001: 'CV00027101'},'cid': {1300234: '01', 4849001: '01'},'x': {1300234: '33.489125', 4849001: '33.452903'},'y': {1300234: '126.487631', 4849001: '126.572552'},'snm': {1300234: '에코티엘(제주)', 4849001: '제주첨단과학단지(엘리트빌딩)'},'0_occupation': {1300234: 0, 4849001: 0},'1_occupation': {1300234: 0, 4849001: 0},'2_occupation': {1300234: 0, 4849001: 0},'3_occupation': {1300234: 0, 4849001: 0},'4_occupation': {1300234: 0, 4849001: 0},'5_occupation': {1300234: 0, 4849001: 0},'6_occupation': {1300234: 0, 4849001: 0},'7_occupation': {1300234: 0, 4849001: 0},'8_occupation': {1300234: 0, 4849001: 0},'9_occupation': {1300234: 0, 4849001: 0},'10_occupation': {1300234: 0, 4849001: 0},'11_occupation': {1300234: 0, 4849001: 0},'12_occupation': {1300234: 0, 4849001: 0},'13_occupation': {1300234: 0, 4849001: 0},'14_occupation': {1300234: 0, 4849001: 0},'15_occupation': {1300234: 0, 4849001: 0},'16_occupation': {1300234: 0, 4849001: 0},'17_occupation': {1300234: 0, 4849001: 0},'18_occupation': {1300234: 0, 4849001: 0},'19_occupation': {1300234: 0, 4849001: 0},'20_occupation': {1300234: 0, 4849001: 0},'21_occupation': {1300234: 0, 4849001: 0},'22_occupation': {1300234: 0, 4849001: 0},'23_occupation': {1300234: 0, 4849001: 0}})

The _occupation columns represent the hour so there are 24 columns like that ranging from 0_occupation to 23_occupation

The function that I am trying to apply to df1 is as follows:

def time_add(x):
    s_date = x['start']
    e_date = x['end']
    s_hour = x['start_hour']
    e_hour = x['end_hour']
    s_min = x['start_minute']
    e_min = x['end_minute']
    if(s_date == e_date):
        first_range = list(range(s_hour 1, e_hour))
        subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date), str(s_hour) "_occupation"] =((60 - s_min)/60)*100
        for i in first_range:
            subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date), str(i) "_occupation"] = 1
        subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date), str(e_hour) "_occupation"] =(e_min/60)*100
    else:
        first_range = list(range(s_hour 1, 24))
        subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date), str(s_hour) "_occupation"] =((60 - s_min)/60)*100
        for i in first_range:
            subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date), str(i) "_occupation"] = 1
        second_range = list(range(0, e_hour))
        for i in second_range:
            subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date 1), str(i) "_occupation"] = 1
        subdf.loc[(subdf["charger_key"] == x['charger_key']) & (subdf["start"] == s_date 1), str(e_hour) "_occupation"] =(e_min/60)*100

However when I try to apply this by doing time_add(df1) an error is raised:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I looked this up and this error seems to occur when using and and or instead of & and | but this isn't the case in my function.

The full traceback of the error is as follows

Thank you in advance!!

CodePudding user response：

In this line:

if(s_date == e_date):

The s_date and e_date variables are not single values, but entire columns from your Pandas data frame. What does it mean when you are comparing them? Do you want to check if all values in one column are equal to the other? Or do you want to check if at least one value is equal to the corresponding one? Or do you want to do different things to the rows depending on whether the corresponding values in those two columns are equal?

if (s_date == e_date).all():
    # True if ALL values are equal.

Or,

if (s_date == e_date).any():
    # True if AT LEAST ONE value is equal.

CodePudding user response：

As Dietrich pointed out, s_date and e_date, and all the other variables declared at the top of your function, are Series's. They're whole columns from your dataframe. But I suspect that's not what you what. You're trying to run time_add for each row. But you're not doing that, because you're calling time_add(df1). There it will be executed on the whole dataframe, and indexing it will return a whole column.

Change

time_add(df1)

df = df1.apply(time_add, axis=1)