I have two dataframes. They both contain the same columns.
data = pd.DataFrame({'old_name': [2, 1], 'new_name': [2, 3], 'start':['2021-05-20', '2008-08-01'], 'end': ['2021-08-08', '2021-05-20']})
old_name new_name start end 2 2 2021-05-20 2021-08-08 1 3 2008-08-01 2022-05-20
base = pd.DataFrame({'old_name': [3], 'new_name': [3], 'start':['2021-05-19'], 'end': ['2022-12-31'})
old_name new_name start end 3 3 2021-05-19 2022-12-31
I am trying to create a new df that takes a new name df "base" and goes back in time finding all the old names and linking them together in descending date order while having the old name's end date >= the start date of the new name. There can be more than one match between old and new name and I have to follow the trail for all of them until start < '2008-08-01'.
The final result should be: old_name new_name start end 3 3 2021-05-19 2022-12-31 1 3 2008-08-01 2022-05-20
data['start'] = pd.to_datetime(data['start'])
base['start'] = pd.to_datetime(base['start'])
data['start'] = pd.to_datetime(data['start'])
base['start'] = pd.to_datetime(base['start'])
begin_date = datetime.datetime(2008, 8, 1)
list = pd.DataFrame(columns=base.columns)
for index, row in base.iterrows():
start_date = row['start']
base_name = row['name']
while start_date > begin_date:
temp = data[(data['new_name'] == base_name) & (data['end'] >= start_date)].copy().reset_index(drop=True)
start_date = data['start']
list = pd.concat([list, temp], ignore_index=True, sort=False)
del temp
I get a "Truth value of a series is ambiguous" but I can't seem to find where I can correct my code. The condition evaluates to True so I'm stuck. Can someone please help me get back on track? Please let me know if my question isn't clear. Thank you!!!
CodePudding user response:
Your code might be erroring out at the line,
while start_date > begin_date:
In this line, you are trying to compare a pandas series(start_date) with begin_date which is not a series, it is a scalar value.
An easy fix would be,
while start_date.values[index] > begin_date:
CodePudding user response:
In first row under the while you access a DataFrame column which is a Series, then you compare it with something. For example data['new_name'] is a Series.