Home > Software design >  Create new columns by comparing the current row's values and previous in Pandas
Create new columns by comparing the current row's values and previous in Pandas

Time:10-14

Given a dummy dataset df as follow:

   year   v1   v2
0  2017  0.3  0.1
1  2018  0.1  0.1
2  2019 -0.2  0.5
3  2020  NaN -0.3
4  2021  0.8  0.0

or:

[{'year': 2017, 'v1': 0.3, 'v2': 0.1},
 {'year': 2018, 'v1': 0.1, 'v2': 0.1},
 {'year': 2019, 'v1': -0.2, 'v2': 0.5},
 {'year': 2020, 'v1': nan, 'v2': -0.3},
 {'year': 2021, 'v1': 0.8, 'v2': 0.0}]

I need to create two more columns trend_v1 and trend_v2 based on v1 and v2 respectively.

The logic to create new columns is this: for each column, if its current value is greater than the previous, the trend value is increase, if its current value is less than the previous, the trend value is decrease, if its current value is equal to the previous, the trend value is equal, if the current or previous value is NaN, the trend also is NaN.

   year   v1   v2  trend_v1  trend_v2
0  2017  0.3  0.1       NaN       NaN
1  2018  0.1  0.1  decrease     equal
2  2019 -0.2  0.5  decrease  increase
3  2020  NaN -0.3       NaN  decrease
4  2021  0.8  0.0       NaN  increase

How could I achieve that in Pandas? Thanks for your help at advance.

CodePudding user response:

You can specify columns for test trend by compare shifted values with filtered missing values:

cols = ['v1','v2']
arr = np.where(df[cols] < df[cols].shift(),'decrease',
      np.where(df[cols] > df[cols].shift(),'increase',
      np.where(df[cols].isna() | df[cols].shift().isna(), None, 'equal')))

df = df.join(pd.DataFrame(arr, columns=cols, index=df.index).add_prefix('trend_'))
print (df)
   year   v1   v2  trend_v1  trend_v2
0  2017  0.3  0.1      None      None
1  2018  0.1  0.1  decrease     equal
2  2019 -0.2  0.5  decrease  increase
3  2020  NaN -0.3      None  decrease
4  2021  0.8  0.0      None  increase

Or:

cols = ['v1','v2']

m1 = df[cols] < df[cols].shift()
m2 = df[cols] > df[cols].shift()
m3 = df[cols].isna() | df[cols].shift().isna()

arr = np.select([m1, m2, m3],['decrease','increase', None], default='equal')

df = df.join(pd.DataFrame(arr, columns=cols, index=df.index).add_prefix('trend_'))
  • Related