Subtract values from different groups-CodePudding

I have the following DataFrame:

given by pd.DataFrame({'Time': [1, 2, 3, 4, 5, 6], 'A': ['a', 'b', 'b', 'c', 'c', 'd'], 'X': [10, 17, 20, 21, 36, 40]}).set_index('Time')

The desired output is:

Time    Difference
0   2   7
1   4   1
2   6   4

The first difference 1 is a result of subtracting 21 from 20: (first "c" value - last "b" value).

I'm open to numPy transformations as well.

CodePudding user response：

Aggregate by GroupBy.agg with GroupBy.first, GroupBy.last and then subtract shifted values for last column with omit first row by positions:

df = df.reset_index()
df1 = df.groupby('A',as_index=False, sort=False).agg(first=('X', 'first'),
                                                     last=('X','last'),
                                                     Time=('Time','first'))

df1['Difference'] = df1['first'].sub(df1['last'].shift(fill_value=0))

df1 = df1[['Time','Difference']].iloc[1:].reset_index(drop=True)
print (df1)
   Time  Difference
0     2           7
1     4           1
2     6           4

CodePudding user response：

IIUC, you can pivot, ffill the columns, and compute the difference:

g = df.reset_index().groupby('A')

(df.assign(col=g.cumcount().values)
   .pivot('A', 'col', 'X')
   .ffill(axis=1)
   .assign(Time=g['Time'].first(),
           diff=lambda d: d[0]-d[1].shift())
   [['Time', 'diff']].iloc[1:]
   .rename_axis(index=None, columns=None)
)

output:

   Time  Difference
b     2         7.0
c     4         1.0
d     6         4.0

Intermediate, pivoted/ffilled dataframe:

col     0     1  Time  Difference
A                                
a    10.0  10.0     1         NaN
b    17.0  20.0     2         7.0
c    21.0  36.0     4         1.0
d    40.0  40.0     6         4.0