Home > OS >  Subtract values from different groups
Subtract values from different groups

Time:09-23

I have the following DataFrame:

    A   X
Time        
1   a   10
2   b   17
3   b   20
4   c   21
5   c   36
6   d   40

given by pd.DataFrame({'Time': [1, 2, 3, 4, 5, 6], 'A': ['a', 'b', 'b', 'c', 'c', 'd'], 'X': [10, 17, 20, 21, 36, 40]}).set_index('Time')

The desired output is:

Time    Difference
0   2   7
1   4   1
2   6   4

The first difference 1 is a result of subtracting 21 from 20: (first "c" value - last "b" value).

I'm open to numPy transformations as well.

CodePudding user response:

Aggregate by GroupBy.agg with GroupBy.first, GroupBy.last and then subtract shifted values for last column with omit first row by positions:

df = df.reset_index()
df1 = df.groupby('A',as_index=False, sort=False).agg(first=('X', 'first'),
                                                     last=('X','last'),
                                                     Time=('Time','first'))

df1['Difference'] = df1['first'].sub(df1['last'].shift(fill_value=0))

df1 = df1[['Time','Difference']].iloc[1:].reset_index(drop=True)
print (df1)
   Time  Difference
0     2           7
1     4           1
2     6           4

CodePudding user response:

IIUC, you can pivot, ffill the columns, and compute the difference:

g = df.reset_index().groupby('A')

(df.assign(col=g.cumcount().values)
   .pivot('A', 'col', 'X')
   .ffill(axis=1)
   .assign(Time=g['Time'].first(),
           diff=lambda d: d[0]-d[1].shift())
   [['Time', 'diff']].iloc[1:]
   .rename_axis(index=None, columns=None)
)

output:

   Time  Difference
b     2         7.0
c     4         1.0
d     6         4.0

Intermediate, pivoted/ffilled dataframe:

col     0     1  Time  Difference
A                                
a    10.0  10.0     1         NaN
b    17.0  20.0     2         7.0
c    21.0  36.0     4         1.0
d    40.0  40.0     6         4.0
  • Related