Condensing an array where some rows differ only by one column (to one with unique rows but more colu-CodePudding

I have a long array (could be pandas or numpy, as convenient) where some rows have the first two columns identical (x-y position), and the third is unique (time), eg:

x     y     t
0.    0.    10.
0.    0.    11.
0.    0.    12.
0.    1.    13.
0.    1.    14.
1.    1.    15.

Positions are grouped, but there may be 1, 2 or 3 time values listed for each, meaning there may be 1, 2 or 3 columns with identical x and y. The array needs to be reshaped/condensed such that each position has its own row, with min and max values of time - i.e., target is:

x     y     t1    t2
0.    0.    10.   12.
0.    1.    13.   14.
1.    1.    15.   inf

Is there a simple/elegant way of doing this in pandas or numpy? I've tried loops but they're messy and terribly inefficient, and I've tried using np.unique:

target_array = np.unique(initial_array[:, 0:2], axis=0)

That yields

x     y 
0.    0.
0.    1.
1.    1.

which is a good start, but then I'm stuck on generating the last two columns.

CodePudding user response：

IIUC, you can use

out = (df.groupby(['x', 'y'])['t']
       .agg(t1='min', t2='max', c='count')
       .reset_index()
       .pipe(lambda df: df.assign(t2=df['t2'].mask(df['c'].eq(1), np.inf)) )
       .drop(columns='c')
       )

print(out)

     x    y    t1    t2
0  0.0  0.0  10.0  12.0
1  0.0  1.0  13.0  14.0
2  1.0  1.0  15.0   inf