subtract value from next rows until condition then subtract new value-CodePudding

say I have:

df ={'animal' : [1, 1, 1, 1, 1, 1, 1, 2, 2],
     'x':[76.551, 77.529, 78.336,77, 78.02, 79.23, 77.733, 79.249,  76.077],
     'y': [151.933, 152.945, 153.970, 119.369, 120.615, 118.935, 119.115, 152.004, 153.027],
    'time': [0, 1, 2, 0, 3,2,5, 0, 1]}
df = pd.DataFrame(df)

# get distance travelled between points
def get_diff(df):
    dx = (df['x'] - df.groupby('animal')['x'].shift(1))
    dy = (df['y'] - df.groupby('animal')['y'].shift(1))
    df['distance'] = (dx**2   dy**2)**0.5
    return df

# get the start coordinates
def get_start(df):
    for i in range(len(df)):
        df.loc[df['distance'] > 5, 'start'] = 'start'
        df.loc[df['distance'].isnull(), 'start'] = 'start'
        return df

df = get_diff(df)
df = get_start(df)

after some preprocessing, I end up with:

   animal       x        y  time   distance  start
0       1  76.551  151.933     0        NaN  start
1       1  77.529  152.945     1   1.407348    NaN
2       1  78.336  153.970     2   1.304559    NaN
3       1  77.000  119.369     0  34.626783  start
4       1  76.020  120.615     3   1.585218    NaN
5       1  79.230  118.935     2   3.623051    NaN
6       1  77.733  119.115     5   1.507783    NaN
7       2  79.249  152.004     0        NaN  start
8       2  76.077  153.027     1   3.332884    NaN

I want to artificially recenter the start coordinates at (0,0). So if the start column has 'start', then subtract the x,y values from all the following rows until reach the next start index, then subtract the new x,y values, etc.

output should look something like:

   animal       x        y  time   distance  start  newX   newY
0       1  76.551  151.933     0        NaN  start    0      0   #(76.551-76.551, 151.993-151.933)
1       1  77.529  152.945     1   1.407348    NaN  0.978  1.012 #(77.529-76.551, 152.945-151.933)
2       1  78.336  153.970     2   1.304559    NaN  1.785  2.012 #(78.336-76.551, 153.970-151.933)
3       1  77.000  119.369     0  34.626783  start    0      0   #(77-77, 119.369-119.369)
4       1  76.020  120.615     3   1.610253    NaN -0.98   1.246 #(76.020-77, 120.615-119.363)
5       1  79.230  118.935     2   3.623051    NaN  2.23  -0.434 #(..., ...)
6       1  77.733  119.115     5   1.507783    NaN  0.733 -0.254
7       2  79.249  152.004     0        NaN  start    0      0   #(79.249-79.249, 152.004-152.004)
8       2  76.077  153.027     1   3.332884    NaN  -3.172 1.023 #(76.077-79.249,153.027-152.004)

CodePudding user response：

You can create a boolean mask based on start, and then use cumsum to turn that into a perfect grouper. Group by it, and then get the first value of x and y for each group. Subtract x and y from those firsts and you have your new columns:

df[['newX', 'newY']] = df[['x', 'y']] - df.groupby(df['start'].eq('start').cumsum())[['x', 'y']].transform('first')

Output:

   animal       x        y  time   distance  start   newX   newY
0       1  76.551  151.933     0        NaN  start  0.000  0.000
1       1  77.529  152.945     1   1.407348    NaN  0.978  1.012
2       1  78.336  153.970     2   1.304559    NaN  1.785  2.037
3       1  77.000  119.369     0  34.626783  start  0.000  0.000
4       1  76.020  120.615     3   1.585218    NaN -0.980  1.246
5       1  79.230  118.935     2   3.623051    NaN  2.230 -0.434
6       1  77.733  119.115     5   1.507783    NaN  0.733 -0.254
7       2  79.249  152.004     0        NaN  start  0.000  0.000
8       2  76.077  153.027     1   3.332884    NaN -3.172  1.023

CodePudding user response：

You can use diff to compute the difference between previous rows.

df[['new_x', 'new_y']] = \
    df.groupby(df['start'].notna().cumsum())[['x', 'y']].diff().fillna(0)
print(df)

# Output
   animal       x        y  time   distance  start  new_x  new_y
0       1  76.551  151.933     0        NaN  start  0.000  0.000
1       1  77.529  152.945     1   1.407348    NaN  0.978  1.012
2       1  78.336  153.970     2   1.304559    NaN  0.807  1.025
3       1  77.000  119.369     0  34.626783  start  0.000  0.000
4       1  78.020  120.615     3   1.610253    NaN  1.020  1.246
5       1  79.230  118.935     2   2.070386    NaN  1.210 -1.680
6       1  77.733  119.115     5   1.507783    NaN -1.497  0.180
7       2  79.249  152.004     0        NaN  start  0.000  0.000
8       2  76.077  153.027     1   3.332884    NaN -3.172  1.023