Home > OS >  Condensing an array where some rows differ only by one column (to one with unique rows but more colu
Condensing an array where some rows differ only by one column (to one with unique rows but more colu

Time:05-29

I have a long array (could be pandas or numpy, as convenient) where some rows have the first two columns identical (x-y position), and the third is unique (time), eg:

x     y     t
0.    0.    10.
0.    0.    11.
0.    0.    12.
0.    1.    13.
0.    1.    14.
1.    1.    15.

Positions are grouped, but there may be 1, 2 or 3 time values listed for each, meaning there may be 1, 2 or 3 columns with identical x and y. The array needs to be reshaped/condensed such that each position has its own row, with min and max values of time - i.e., target is:

x     y     t1    t2
0.    0.    10.   12.
0.    1.    13.   14.
1.    1.    15.   inf

Is there a simple/elegant way of doing this in pandas or numpy? I've tried loops but they're messy and terribly inefficient, and I've tried using np.unique:

target_array = np.unique(initial_array[:, 0:2], axis=0) 

That yields

x     y 
0.    0.
0.    1.
1.    1. 

which is a good start, but then I'm stuck on generating the last two columns.

CodePudding user response:

IIUC, you can use

out = (df.groupby(['x', 'y'])['t']
       .agg(t1='min', t2='max', c='count')
       .reset_index()
       .pipe(lambda df: df.assign(t2=df['t2'].mask(df['c'].eq(1), np.inf)) )
       .drop(columns='c')
       )
print(out)

     x    y    t1    t2
0  0.0  0.0  10.0  12.0
1  0.0  1.0  13.0  14.0
2  1.0  1.0  15.0   inf
  • Related