I'm trying to create a pandas dataframe that only has certain values in the ticker
column. If I only had 1 value (for example AAPL
) that I wanted in the ticker column, I would do this:
df = complete_data[complete_data.ticker == 'AAPL']
But in this case, I want the df
to consist of multiple different values in the ticker column. These values are saved in a list. The list looks like:
company_list = ['A', 'AAPL', 'ABB', 'ABBV', 'ABEV', 'ABT', 'ACN', 'ADBE', 'ADP', 'ADSK'] etc.
I would appreciate help to make the dataframe just contain values in the ticker column that are in my list below.
I have tried changing the code above to:
data_df = data_df[data_df.ticker in company_list]
and
data_df = data_df[data_df.ticker if in company_list]
and
data_df = data_df[data_df.ticker == company_list]
But this has not given me any results thus far.
CodePudding user response:
Here's a dummy data I made based on your question.
import pandas as pd
data_df = pd.DataFrame({"ticker": ['A', 'AAPL', 'ABB', 'ABBV', 'ABEV', "BADD", "CAPP"],
"random_col": [1,2,3,4,5,6,7]})
Series.isin is the function you need to use. For series, you extract the concerned column, by calling the column itself or the attribute.
data_df_in = data_df[data_df.ticker.isin(company_list)]
data_df_in = data_df[data_df["ticker"].isin(company_list)]
ticker random_col
0 A 1
1 AAPL 2
2 ABB 3
3 ABBV 4
4 ABEV 5
Not just a list, you can also use a column from other dataframe as well.
other_df = pd.DataFrame({"ticker": ['ABEV', "BADD"],
"random_col": [1,2]})
data_df_in = data_df[data_df.ticker.isin(other_df.ticker)]
ticker random_col
4 ABEV 5
5 BADD 6
Alternatively, you can filter them out by using the tilda sign.
data_df_notin = data_df[~data_df.ticker.isin(company_list)]
ticker random_col
5 BADD 6
6 CAPP 7