Home > other >  How to create pandas dataframe with only certain values (which are in a list) in a specific column?
How to create pandas dataframe with only certain values (which are in a list) in a specific column?

Time:03-04

I'm trying to create a pandas dataframe that only has certain values in the ticker column. If I only had 1 value (for example AAPL) that I wanted in the ticker column, I would do this:

df = complete_data[complete_data.ticker == 'AAPL']

But in this case, I want the df to consist of multiple different values in the ticker column. These values are saved in a list. The list looks like:

company_list = ['A', 'AAPL', 'ABB', 'ABBV', 'ABEV', 'ABT', 'ACN', 'ADBE', 'ADP', 'ADSK'] etc.

I would appreciate help to make the dataframe just contain values in the ticker column that are in my list below.

I have tried changing the code above to:

data_df = data_df[data_df.ticker in company_list]

and

data_df = data_df[data_df.ticker if in company_list]

and

data_df = data_df[data_df.ticker == company_list]

But this has not given me any results thus far.

CodePudding user response:

Here's a dummy data I made based on your question.

import pandas as pd

data_df = pd.DataFrame({"ticker": ['A', 'AAPL', 'ABB', 'ABBV', 'ABEV', "BADD", "CAPP"],
                        "random_col": [1,2,3,4,5,6,7]})

Series.isin is the function you need to use. For series, you extract the concerned column, by calling the column itself or the attribute.

data_df_in = data_df[data_df.ticker.isin(company_list)]
data_df_in = data_df[data_df["ticker"].isin(company_list)]

  ticker  random_col
0      A           1
1   AAPL           2
2    ABB           3
3   ABBV           4
4   ABEV           5

Not just a list, you can also use a column from other dataframe as well.

other_df = pd.DataFrame({"ticker": ['ABEV', "BADD"],
                        "random_col": [1,2]})

data_df_in = data_df[data_df.ticker.isin(other_df.ticker)]

  ticker  random_col
4   ABEV           5
5   BADD           6

Alternatively, you can filter them out by using the tilda sign.

data_df_notin = data_df[~data_df.ticker.isin(company_list)]

  ticker  random_col
5   BADD           6
6   CAPP           7
  • Related