Home > database >  How do I compare dates between two columns within a range of days and perform a task?
How do I compare dates between two columns within a range of days and perform a task?

Time:04-08

Every time there is an UNKNOWN in df, I would like to use the UNKNOWN delivery date and check against the oldest delivery date (grouped by car_part) in df2 to see if it matches within - 90 days range? If the date matches, then print the date else go to the next UNKNOWN.

data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
        'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
        'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
        'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}  

data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'], 
         'car_part': ['100009','100009','100009','100033','100033','100033']}  

df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)

df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])

df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])

I've tried doing this

df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
    if row['car_number'] == "UNKNOWN":
        oldest_date = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
        diff = (row['delivery']-oldest_date).days
        if diff<91:
            print(row['delivery']) 

but getting error AttributeError: 'Series' object has no attribute 'days'

CodePudding user response:

Try:

  1. Use groupby and min to get the earliest delivery date for each car part.
  2. Find the difference between the delivery in df and the earliest delivery date and save to diff
  3. Keep the oldest date value only when the car number is UNKNOWN and the delivery is within 90 days from the oldest date.
oldest = df["car_part"].map(df2.groupby("car_part")["delivery"].min())

df["oldest"] = oldest.where(df["car_number"].eq("UNKNOWN")&df["delivery"].sub(oldest).abs().dt.days.le(90))

>>> df
  car_part   car_number   delivery        test     oldest
0   100009      UNKNOWN 2004-11-20  12/17/2009 2004-11-01
1   100093  X123-00027C 2009-12-17   7/27/2010        NaT
2   100071  X123-00027C 2010-07-27   7/10/2020        NaT
3   100033      UNKNOWN 2004-11-01  12/22/2006        NaT
4   100033  X123-00148C 2004-09-05   3/26/2007        NaT
5   100043  X123-00148C 2004-11-10   12/1/2007        NaT

CodePudding user response:

Change your code with this. I don't really understand the final ouptut and what you are asking however your map is wrong. Since you want to use the same structure of code, the map line should be something like this

df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
    if row['car_number'] == "UNKNOWN":
        oldest_date = df2[df["car_part"]==row["car_part"]].groupby("car_part")["delivery"].min().values[0]
        diff = (row['delivery']-oldest_date).days
        if diff<91:
            print(row['delivery']) 
  • Related