Home > Mobile >  How to drop rows in pandas dataframe, when there is similar values?
How to drop rows in pandas dataframe, when there is similar values?

Time:09-26

I have a python pandas dataframe of stock data, and I'm trying to filter some of those tickers. There are companies that have 2 or more tickers (different types of shares when a share is preferred and the other not). I want to drop the lines of those additional share values, and let just the share with the higher volume. In the dataframe I also have the company name, so maybe there is a way of using it to make some condition and then drop it when comparing the volume of the same company? How can I do this? Dataframe

CodePudding user response:

Use groupby and idxmax:

Suppose this dataframe:

>>> df
  ticker  volume
0  CEBR3     123
1  CEBR5     456
2  CEBR6     789  # <- keep for group CEBR
3  GOAU3      23  # <- keep for group GOAU
4  GOAU4      12
5  CMIN3     135  # <- keep for group CMIN3
>>> df.loc[df.groupby(df['ticker'].str.extract(r'^(.*)\d', expand=False), 
                      sort=False)['volume'].idxmax().tolist()]

  ticker  volume
2  CEBR6     789
3  GOAU3      23
5  CMIN3     135
  • Related