I tried to make a code to inputting version of items. There are two dataframes, one with the items, one with the version info.
Here's a dummy data and a code I made:
import pandas as pd
from datetime import datetime
Item = pd.DataFrame({"ID":["A1","A1","A2","A2","A3","B1"],"DATE":["2021-07-05","2021-08-01","2021-02-02","2021-02-03","2021-01-01","2021-10-12"]})
Ver = pd.DataFrame({"ver_date" : ["2021-01-01","2021-07-07","2021-09-09"],"version":["1.1","1.2","1.3"]})
for id,dat in zip(Item["ID"],Item["DATE"]):
dat2 = datetime.strptime(dat,'%Y-%m-%d')
for dtc,ver in zip(Ver["ver_date"],Ver["version"]):
dtc = datetime.strptime(dtc,'%Y-%m-%d')
if dat2 >= dtc:
Item.loc[(Item.ID == id) & (Item.DATE == dat),"VER"] = ver
continue
else:
pass
So what I tried is this code. Actually this code works, but I think it is not efficient since it gives me a result in about 1~2 min. (for 200 items).
CodePudding user response:
You could use merge_asof
. It merges by key distance. By default, it uses a "backward" search, which selects the last row in the Ver
whose "ver_date" is less than or equal to the "Date" of Item
(which is what your if-else condition is checking for).
Also it expects the keys to be merged on to be sorted; so you could first convert the date columns to datetime objects and sort the DataFrames by them:
Item['DATE'] = pd.to_datetime(Item['DATE'])
Ver['ver_date'] = pd.to_datetime(Ver['ver_date'])
out = (pd.merge_asof(Item.sort_values(by='DATE'),
Ver.sort_values(by='ver_date'),
left_on='DATE', right_on='ver_date')
.drop(columns='ver_date')
.sort_values(by='ID')
.rename(columns={'version':'VER'}))
Output:
ID DATE VER
3 A1 2021-07-05 1.1
4 A1 2021-08-01 1.2
1 A2 2021-02-02 1.1
2 A2 2021-02-03 1.1
0 A3 2021-01-01 1.1
5 B1 2021-10-12 1.3