Every time there is an UNKNOWN in df, I would like to use the UNKNOWN delivery date and check against the oldest delivery date (grouped by car_part) in df2 to see if it matches within - 90 days range? If the date matches, then print the date else go to the next UNKNOWN.
data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}
data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'],
'car_part': ['100009','100009','100009','100033','100033','100033']}
df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])
df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])
I've tried doing this
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])
but getting error AttributeError: 'Series' object has no attribute 'days'
CodePudding user response:
Try:
- Use
groupby
andmin
to get the earliest delivery date for each car part. - Find the difference between the delivery in
df
and the earliest delivery date and save todiff
- Keep the oldest date value only when the car number is UNKNOWN and the delivery is within 90 days from the oldest date.
oldest = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
df["oldest"] = oldest.where(df["car_number"].eq("UNKNOWN")&df["delivery"].sub(oldest).abs().dt.days.le(90))
>>> df
car_part car_number delivery test oldest
0 100009 UNKNOWN 2004-11-20 12/17/2009 2004-11-01
1 100093 X123-00027C 2009-12-17 7/27/2010 NaT
2 100071 X123-00027C 2010-07-27 7/10/2020 NaT
3 100033 UNKNOWN 2004-11-01 12/22/2006 NaT
4 100033 X123-00148C 2004-09-05 3/26/2007 NaT
5 100043 X123-00148C 2004-11-10 12/1/2007 NaT
CodePudding user response:
Change your code with this. I don't really understand the final ouptut and what you are asking however your map is wrong. Since you want to use the same structure of code, the map line should be something like this
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df2[df["car_part"]==row["car_part"]].groupby("car_part")["delivery"].min().values[0]
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])