I have a large csv file containing the historic prices of stocks. This is a small sample of it:
data = pd.DataFrame({'sym': {0: 'msft', 1: 'msft', 2: 'tsla', 3: 'tsla', 4: 'bac', 5: 'bac'}, 'date': {0: '12/7/2021', 1: '12/6/2021', 2: '12/7/2021', 3: '12/6/2021', 4: '12/7/2021', 5: '12/6/2021'}, 'high': {0: 11, 1: 13, 2: 898, 3: 900, 4: 12, 5: 13}})
Now on each day there will be an update for this data and I want to append them to the data
above. The updates look like this:
update = pd.DataFrame({'sym': {0: 'msft', 1: 'tsla', 2: 'bac'}, 'date': {0: '12/8/2021', 1: '12/8/2021', 2: '12/8/2021'}, 'high': {0: 16, 1: 1000, 2: 14}})
What I want is the dataframe below:
result = pd.DataFrame({'sym': {0: 'msft', 1: 'msft', 2: 'msft', 3: 'tsla', 4: 'tsla', 5: 'tsla', 6: 'bac', 7: 'bac', 8: 'bac'}, 'date': {0: '12/8/2021', 1: '12/7/2021', 2: '12/6/2021', 3: '12/8/2021', 4: '12/7/2021', 5: '12/6/2021', 6: '12/8/2021', 7: '12/7/2021', 8: '12/6/2021'}, 'high': {0: 16, 1: 11, 2: 13, 3: 1000, 4: 898, 5: 900, 6: 14, 7: 12, 8: 13}})
My current approach is using this code:
data = data.append(update)
data = data.sort_values(by=['sym', 'date'])
By tweaking the above approach I can achieve what I want but since I have million rows in my database, I was wondering if there is a faster way other than using sort_values
.
CodePudding user response:
result=pd.merge_ordered(data,update,on=['date','high'],left_by='sym',fill_method='ffill').drop(['sym_x','sym_y'],axis=1)
CodePudding user response:
IIUC, you want to keep the order of sym
as it appears in data
while sorting data
in descending order. You can do that by converting sym
-column to category and setting its category order by the order it appears in data
. Then simply sort_values
by ['sym','date']
:
sorter = data['sym'].drop_duplicates()
out = data.append(update)
out['sym'] = out['sym'].astype("category").cat.set_categories(sorter)
out = out.sort_values(by=['sym','date'], ascending=['sym',False]).reset_index(drop=True)
Output:
sym date high
0 msft 12/8/2021 16
1 msft 12/7/2021 11
2 msft 12/6/2021 13
3 tsla 12/8/2021 1000
4 tsla 12/7/2021 898
5 tsla 12/6/2021 900
6 bac 12/8/2021 14
7 bac 12/7/2021 12
8 bac 12/6/2021 13