When multiple symbols and days occur, how to only keep the first occurrence of the day and symbol?-CodePudding

If I have a dataframe of daily data which contain symbols and different dates:

     level_0   index        date symbol    open  ...  volume_10_day  is_downtrending  is_downtrending_lookback  consolidating_10  consolidating_10_lookback
0       3608    3608  2022-10-26   CIFR  0.8600  ...         3883.2                0                         0                 0                          1
1      11367   11367  2022-09-12   CLVS  1.2800  ...        24749.8                0                         0                 0                          1
2      13031   13031  2022-10-06    CGC  3.0700  ...      3807474.9                0                         0                 0                          1
3      13044   13044  2022-10-25    CGC  2.4000  ...      4213340.1                0                         0                 0                          1
4      13864   13864  2022-09-02   CMCM  4.9100  ...         3560.0                0                         0                 0                          1
..       ...     ...         ...    ...     ...  ...            ...              ...                       ...               ...                        ...
353   684622  684622  2022-10-24   SOBR  3.2500  ...        65830.2                0                         0                 0                          1
354   685045  685045  2022-08-29   SNTG  2.6500  ...        12765.3                0                         1                 0                          1
355   685093  685093  2022-11-04   SNTG  4.6889  ...     17969582.7                0                         0                 0                          0
356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1
357   688103  688103  2022-10-11    BHG  0.8750  ...         1489.5                0                         1                 0                          1

[358 rows x 18 columns]

Sometimes, there are multiplies of the same days but with different symbols. For example, on 2022-10-11 there are two symbols which occur: WNW, BHG.

356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1
357   688103  688103  2022-10-11    BHG  0.8750  ...         1489.5                0                         1                 0                          1

When this happens, I only want the first instance to be returned (all other symbols occurring on the same day should be removed), something like:

     level_0   index        date symbol    open  ...  volume_10_day  is_downtrending  is_downtrending_lookback  consolidating_10  consolidating_10_lookback
0       3608    3608  2022-10-26   CIFR  0.8600  ...         3883.2                0                         0                 0                          1
1      11367   11367  2022-09-12   CLVS  1.2800  ...        24749.8                0                         0                 0                          1
2      13031   13031  2022-10-06    CGC  3.0700  ...      3807474.9                0                         0                 0                          1
3      13044   13044  2022-10-25    CGC  2.4000  ...      4213340.1                0                         0                 0                          1
4      13864   13864  2022-09-02   CMCM  4.9100  ...         3560.0                0                         0                 0                          1
..       ...     ...         ...    ...     ...  ...            ...              ...                       ...               ...                        ...
353   684622  684622  2022-10-24   SOBR  3.2500  ...        65830.2                0                         0                 0                          1
354   685045  685045  2022-08-29   SNTG  2.6500  ...        12765.3                0                         1                 0                          1
355   685093  685093  2022-11-04   SNTG  4.6889  ...     17969582.7                0                         0                 0                          0
356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1

[357 rows x 18 columns]

Where in the duplicate of WNW, BHG, only the first one (WNW) is returned.

How can I do this? Something like:

df_filtered.drop_duplicates(subset=['date', 'symbol'], inplace=True)

Any help is much appreciated

CodePudding user response：

Per the discussion in the comments, this solution works:

df_filtered.drop_duplicates(subset=['date'], keep='first', inplace=True)