Home > OS >  How to eliminate certain rows from a dataframe
How to eliminate certain rows from a dataframe

Time:04-09

Hello so i have a dataframe that looks like this:

df= {'country_name':['Albania', 'Algeria', 'Andorra', 'Angola'],'commodity_code':['55','55','55','55'],'year':[2000,2000,2000,2000],'trade_value':[10000,12000,'NaN',105]}

Essentially, this is a long dataframe in which I have many countries from 2000 to 2020 and the trade value for commodity "55" and commodity "73". What I need is to eliminate those countries that never exported commodity 55 and commodity 73. I need to eliminate those countries that for every year for each commodity the column trade value equals 0, not NaN, thus they never exported the commodity.

Thanks.

CodePudding user response:

You can use isin() and negate it.

import pandas as pd

data = {
    'country_name': ['Albania', 'Algeria', 'Andorra', 'Angola'],
    'commodity_code': ['55', '55', '55', '55'],
    'year': [2000, 2000, 2000, 2000],
    'trade_value': [10000, 12000, 'NaN', 105],
    }
df = pd.DataFrame(data)
df = df[~df['commodity_code'].isin(['55', '73'])]

Output:

country_name    commodity_code  year    trade_value

CodePudding user response:

In [1]: import pandas as pd
   ...:
   ...: data = {
   ...:     'country_name': ['Albania', 'Algeria', 'Andorra', 'Angola'],
   ...:     'commodity_code': ['55', '55', '55', '55'],
   ...:     'year': [2000, 2000, 2000, 2000],
   ...:     'trade_value': [10000, 12000, 'NaN', 0],
   ...:     }
   ...: df = pd.DataFrame(data)

In [2]: cond = df["commodity_code"].isin(["55", "73"]) & df["trade_value"].isin([0, "NaN"])

In [3]: df.drop(df[cond].index)
Out[3]:
  country_name commodity_code  year trade_value
0      Albania             55  2000       10000
1      Algeria             55  2000       12000

In [4]:
  • Related