I have a long array (could be pandas or numpy, as convenient) where some rows have the first two columns identical (x-y position), and the third is unique (time), eg:
x y t
0. 0. 10.
0. 0. 11.
0. 0. 12.
0. 1. 13.
0. 1. 14.
1. 1. 15.
Positions are grouped, but there may be 1, 2 or 3 time values listed for each, meaning there may be 1, 2 or 3 columns with identical x and y. The array needs to be reshaped/condensed such that each position has its own row, with min and max values of time - i.e., target is:
x y t1 t2
0. 0. 10. 12.
0. 1. 13. 14.
1. 1. 15. inf
Is there a simple/elegant way of doing this in pandas or numpy? I've tried loops but they're messy and terribly inefficient, and I've tried using np.unique
:
target_array = np.unique(initial_array[:, 0:2], axis=0)
That yields
x y
0. 0.
0. 1.
1. 1.
which is a good start, but then I'm stuck on generating the last two columns.
CodePudding user response:
IIUC, you can use
out = (df.groupby(['x', 'y'])['t']
.agg(t1='min', t2='max', c='count')
.reset_index()
.pipe(lambda df: df.assign(t2=df['t2'].mask(df['c'].eq(1), np.inf)) )
.drop(columns='c')
)
print(out)
x y t1 t2
0 0.0 0.0 10.0 12.0
1 0.0 1.0 13.0 14.0
2 1.0 1.0 15.0 inf