Home > Enterprise >  Select a row if two consecutive columns contain a negative value
Select a row if two consecutive columns contain a negative value

Time:04-12

From the given table of inflation rates below, I want to obtain the countries with negative inflation rates for two consecutive years.

            2017 2018   2019    2020    2021    2022
Country                     
Turkey      NaN  47.0   -7.0    -19.0   38.0    260.0
Argentina   NaN  33.0   56.0    -22.0   15.0    8.0
Suriname    NaN  -68.0  -37.0   695.0   56.0    13.0
Zimbabwe    NaN  106.0  2306.0  118.0   -83.0   -21.0
Lebanon     NaN  2.0    -36.0   2826.0  82.0    39.0
Sudan       NaN  96.0   -19.0   220.0   19.0    34.0
Venezuela   NaN  1482.0 -70.0   -88.0   15.0    -89.0

I have seen some solutions in SO that use list comprehension or loops. I wonder if this task is possible without them.

I attempted to convert the dataframe into 1s and 0s, in which 1.0 indicates a negative inflation.

           2017 2018 2019 2020 2021 2022
Country                     
Turkey      NaN 0.0 1.0 1.0 0.0 0.0
Argentina   NaN 0.0 0.0 1.0 0.0 0.0
Suriname    NaN 1.0 1.0 0.0 0.0 0.0
Zimbabwe    NaN 0.0 0.0 0.0 1.0 1.0
Lebanon     NaN 0.0 1.0 0.0 0.0 0.0
Sudan       NaN 0.0 1.0 0.0 0.0 0.0
Venezuela   NaN 0.0 1.0 1.0 0.0 1.0

However, I am stuck at this point. I tried to use np.prod function but this returns 0 if at least one column as 0.0 data.

Any ideas about how to solve this problem?

CodePudding user response:

You can first set an integer mask for the negative values (1 means negative). Then compute a rolling min on the axis 1, of the min is 1 all values are. This is generalizable to any number of consecutive columns.

N = 2
m1 = df.lt(0).astype(int)
m2 = m.rolling(N, axis=1).min().eq(1).any(1)

df[m2]

Output:

           2017    2018    2019   2020  2021   2022
Country                                            
Turkey      NaN    47.0    -7.0  -19.0  38.0  260.0
Suriname    NaN   -68.0   -37.0  695.0  56.0   13.0
Zimbabwe    NaN   106.0  2306.0  118.0 -83.0  -21.0
Venezuela   NaN  1482.0   -70.0  -88.0  15.0  -89.0

NB. One needs to work with integers as rolling is currently limited to numeric types

Alternative with a single mask for N=2

m = df.lt(0)
df[(m&m.shift(axis=1)).any(1)]

CodePudding user response:

Try this:

match = (df.lt(0) & df.shift(axis=1).lt(0)).any(axis=1)
df[match]

How it works:

  • df.lt(0): current year inflation is less than 0
  • df.shift(axis=1).lt(0): previous year inflation is less than 0
  • .any(axis=1): any such occurrence in the country.

CodePudding user response:

Given your dataframe, this is what would work for me:

  • set the Country as an index so I just have digits in my df values
  • Define new column for check of 'Two sequential negatives' in columns using df.shift(axis=1).

So it would look like:

df.set_index('Country',inplace=True)
df['TwoNegatives'] = ((df.values < 0) & ((df.shift(axis=1)).values <0)).any(axis=1)

CodePudding user response:

Try with rolling

out = df[df.le(0).T.rolling(window=2).sum().ge(2).any()]
Out[15]: 
           2017    2018    2019   2020  2021   2022
Country                                            
Turkey      NaN    47.0    -7.0  -19.0  38.0  260.0
Suriname    NaN   -68.0   -37.0  695.0  56.0   13.0
Zimbabwe    NaN   106.0  2306.0  118.0 -83.0  -21.0
Venezuela   NaN  1482.0   -70.0  -88.0  15.0  -89.0
  • Related