Hi I have 2 tables as such:
product table
product_id | product | url |
---|---|---|
23 | product a | url a |
24 | product b | url b |
25 | product c | url c |
price table
product_id | product | last updated | price |
---|---|---|---|
23 | product a | 25/03/2022 | usd2 |
23 | product a | 24/03/2022 | usd3 |
25 | product c | 25/03/202 | usd 4 |
24 | product b | 23/03/202 | usd 4 |
output expected as list: url b
output would be a list of urls from product table, where the condition is that
uniqueProductIdtoday = priceTable[priceTable['last updated']==datetime.date(datetime.now())]['product_id'].unique()
uniqueProductIdtoday right join product table on product Id and return unique product id.
So I am trying to get the url from product table that has not been updated in price table per datetime specified(take for example 25/03/2022).
What's the most efficient way? Do I turn the product_id that has been last updated into a list / dataframe before checking for the difference in product table by doing a merge?
CodePudding user response:
Use:
product_table = pd.DataFrame({'pid':[1,2,3], 'url':['http://a', 'http://b', 'http://c']})
price_table = pd.DataFrame({'pid':[2,1,1,1,3,2], 'last_updated': ['25/03/2022', '23/03/2022', '22/03/2022', '24/03/2022', '25/03/2022', '24/03/2022']})
ndf = price_table.sort_values('last_updated').drop_duplicates(['pid'], keep = 'last')
keys = ndf[ndf['last_updated']!='25/03/2022']['pid'].unique()
product_table[product_table['pid'].isin(keys)]['url']
Output: