Home > Mobile >  Python: Change value in dataframe column w.r.t variable value and other column value as well
Python: Change value in dataframe column w.r.t variable value and other column value as well

Time:10-23

have got input dataframe like below and a variable 'extra'

input_df

Item    Space   refill      Min     Max
Apple   0.25    12.53       0.125   0.25
Lemon   0.25    11.2        0.25    0.375       
Potato  0.375   10.3        0.375   0.75
Melon   0.375   9.2         0.25    0.75

'extra' variable can be negative or positive.

Expected output_df if extra = 0.25 (this extra 0.25 space need to be splitted in the multiples of 0.125 and added to items with higher refill order. In this case, 0.125 0.125 = 0.25 so two items space will be updated. Also should consider Max value here, on adding space it should not exceed Max value, if happens skip that item and move to next.)

Item    Space   refill      Min     Max
Apple   0.25    12.53       0.125   0.25        #item skipped since it exceeds Max limit if extra space added
Lemon   0.375   11.2        0.25    0.375       #0.125 from extra space added here
Potato  0.5     10.3        0.375   0.75        #remaining 0.125 from extra space added here
Melon   0.375   9.2         0.25    0.75        #no more space to be added

Expected output_df if extra = -0.25 (this extra -0.25 space[which is two times of -0.125] need to be reduced from items with higher refill. Also should consider Min Value here, on reducing space it should not go below Min value, if happens skip that item and move to next.)

Item    Space   refill      Min     Max
Apple   0.125   12.53       0.125   0.25        #-0.125 from extra space reduced here
Lemon   0.25    11.2        0.25    0.375       #item skipped since it doesn't satisfy Min limit if reduced
Potato  0.375   10.3        0.375   0.75        #item skipped since it doesn't satisfy Min limit if reduced
Melon   0.25    9.2         0.25    0.75        #rem -0.125 from extra space reduced here

Any help will be appreciated. Thank You!

CodePudding user response:

Assumptions:

  1. Both extra and the values in columns Min and Max should always contain multiples of 0.125. E.g. df.Min % 0.125 should always lead to a series of zeros.
  2. The df index contains unique values only. If this is not yet the case, use df.reset_index(drop=True, inplace=True) to reset it. Or use drop=False, if you want to keep the original index values. You can then reset them afterwards, using df.set_index('index', drop=True, inplace=True).

Given the assumptions, the following should achieve this:

import pandas as pd
import numpy as np

data = {'Item': {0: 'Apple', 1: 'Lemon', 2: 'Potato', 3: 'Melon'}, 
        'Space': {0: 0.25, 1: 0.25, 2: 0.375, 3: 0.375}, 
        'refill': {0: 12.53, 1: 11.2, 2: 10.3, 3: 9.2}, 
        'Min': {0: 0.125, 1: 0.25, 2: 0.375, 3: 0.25}, 
        'Max': {0: 0.25, 1: 0.375, 2: 0.75, 3: 0.75}}

def f(df, extra):
    length = int(abs(extra/0.125))
    repeats = np.repeat([0.125 * -1 if np.signbit(extra) else 0.125], len(df))

    tmp = np.where((pd.Series(repeats).lt(0) & df.Space.gt(df.Min)) | 
             (pd.Series(repeats).gt(0) & df.Space.lt(df.Max)), 
             df.Space.add(repeats), df.Space)
    
    new_series = np.where(df.Space.ne(tmp).cumsum().gt(length), df.Space, tmp)
    return new_series

Function applied:

df = pd.DataFrame(data)
df.Space = f(df, 0.25)

print(df)

     Item  Space  refill    Min    Max
0   Apple  0.250   12.53  0.125  0.250
1   Lemon  0.375   11.20  0.250  0.375
2  Potato  0.500   10.30  0.375  0.750
3   Melon  0.375    9.20  0.250  0.750

df = pd.DataFrame(data)
df.Space = f(df, -0.25)

print(df)

     Item  Space  refill    Min    Max
0   Apple  0.125   12.53  0.125  0.250
1   Lemon  0.250   11.20  0.250  0.375
2  Potato  0.375   10.30  0.375  0.750
3   Melon  0.250    9.20  0.250  0.750

Suppose we use extra= 0.375, then the last item will also change:

df = pd.DataFrame(data)
df.Space = f(df, 0.375)

print(df)

     Item  Space  refill    Min    Max
0   Apple  0.250   12.53  0.125  0.250
1   Lemon  0.375   11.20  0.250  0.375
2  Potato  0.500   10.30  0.375  0.750
3   Melon  0.500    9.20  0.250  0.750 # also changed

Explanation:

  • Inside function f, we first create an np.array named repeats with a length equal to your df, filled either with -0.125 or 0.125, depending on the sign of extra (using np.signbit).
  • Next, we use np.where and check either (values in column Space < 0 and > values in column Min) or (values in column Space > 0 and < values in column Max). For each True we want df.Space.add(repeats), for each False we keep the values from df.Space. We store the result in a variable tmp. See Series.lt and Series.gt.
  • Finally, we apply np.where again to limit the amount of changes to be made. In order to figure this out, we first apply Series.ne to tmp and get True for each mismatch (i.e. a change). Applying Series.cumsum will get us a cumulative sum of all the True values. Once, a value inside this series becomes greater than int(abs(extra/0.125)) we need to stop making changes. So, from that point onwards, we fall back on the original values in column Space.
  • Related