If I have a dataframe of daily data which contain symbols and different dates:
level_0 index date symbol open ... volume_10_day is_downtrending is_downtrending_lookback consolidating_10 consolidating_10_lookback
0 3608 3608 2022-10-26 CIFR 0.8600 ... 3883.2 0 0 0 1
1 11367 11367 2022-09-12 CLVS 1.2800 ... 24749.8 0 0 0 1
2 13031 13031 2022-10-06 CGC 3.0700 ... 3807474.9 0 0 0 1
3 13044 13044 2022-10-25 CGC 2.4000 ... 4213340.1 0 0 0 1
4 13864 13864 2022-09-02 CMCM 4.9100 ... 3560.0 0 0 0 1
.. ... ... ... ... ... ... ... ... ... ... ...
353 684622 684622 2022-10-24 SOBR 3.2500 ... 65830.2 0 0 0 1
354 685045 685045 2022-08-29 SNTG 2.6500 ... 12765.3 0 1 0 1
355 685093 685093 2022-11-04 SNTG 4.6889 ... 17969582.7 0 0 0 0
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
357 688103 688103 2022-10-11 BHG 0.8750 ... 1489.5 0 1 0 1
[358 rows x 18 columns]
Sometimes, there are multiplies of the same days but with different symbols. For example, on 2022-10-11 there are two symbols which occur: WNW, BHG.
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
357 688103 688103 2022-10-11 BHG 0.8750 ... 1489.5 0 1 0 1
When this happens, I only want the first instance to be returned (all other symbols occurring on the same day should be removed), something like:
level_0 index date symbol open ... volume_10_day is_downtrending is_downtrending_lookback consolidating_10 consolidating_10_lookback
0 3608 3608 2022-10-26 CIFR 0.8600 ... 3883.2 0 0 0 1
1 11367 11367 2022-09-12 CLVS 1.2800 ... 24749.8 0 0 0 1
2 13031 13031 2022-10-06 CGC 3.0700 ... 3807474.9 0 0 0 1
3 13044 13044 2022-10-25 CGC 2.4000 ... 4213340.1 0 0 0 1
4 13864 13864 2022-09-02 CMCM 4.9100 ... 3560.0 0 0 0 1
.. ... ... ... ... ... ... ... ... ... ... ...
353 684622 684622 2022-10-24 SOBR 3.2500 ... 65830.2 0 0 0 1
354 685045 685045 2022-08-29 SNTG 2.6500 ... 12765.3 0 1 0 1
355 685093 685093 2022-11-04 SNTG 4.6889 ... 17969582.7 0 0 0 0
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
[357 rows x 18 columns]
Where in the duplicate of WNW, BHG, only the first one (WNW) is returned.
How can I do this? Something like:
df_filtered.drop_duplicates(subset=['date', 'symbol'], inplace=True)
Any help is much appreciated
CodePudding user response:
Per the discussion in the comments, this solution works:
df_filtered.drop_duplicates(subset=['date'], keep='first', inplace=True)