Remove variables from a recarray by variable name-CodePudding

I need to concatenate 2 rec.arrays (same procedure I do for all other in my work). Problem I have is one of the documents I read for the array, has 2 extra variables that I need to remove to match the variables of the other array to concatenate. I have tried several things, like using the index to remove, all lead to error.

This is the array

vswhr1
rec.array([('ny20110325s0a06c.001', 2011.23149798,  84.49677, 11.9223, 1.000e 00, 78.923, 11.923, 0.024, 0.024, 77.286, 189.465  ,  1.688, 180.     , 0.0019, 0., 0.00167, 60., 1003.84003, -15.7, 1003.84003, 65.8, -1., 0.    , -1., -1., 9.8765e 35, 9.8765e 35, 5.96541e 21, 2.60898e 19, 8.45080e 21, 7.92632e 19, 8.74633e 21, 8.68890e 19),
           ('ny20110325s0a06c.002', 2011.23150704,  84.50007, 12.0017, 2.000e 00, 78.923, 11.923, 0.024, 0.024, 77.325, 190.686  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16. , 1003.83002, 68.7, -1., 0.    , -1., -1., 9.8765e 35, 9.8765e 35, 5.93553e 21, 2.54199e 19, 8.43518e 21, 7.75936e 19, 8.72990e 21, 8.60191e 19),
           ('ny20110325s0a06c.003', 2011.23150736,  84.50019, 12.0045, 3.000e 00, 78.923, 11.923, 0.024, 0.024, 77.326, 190.728  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16.1, 1003.83002, 68.9, -1., 0.    , -1., -1., 9.8765e 35, 9.8765e 35, 5.93643e 21, 2.59443e 19, 8.42675e 21, 8.17653e 19, 8.73537e 21, 8.68880e 19),
           ...,
           ('ny20180919s0i06c.0042', 2018.71887239, 262.38843,  9.3221, 1.234e 03, 78.923, 11.923, 0.024, 0.027, 78.69 , 152.737  , -1.722, 180.00999, 0.0019, 0., 0.00188, 60., 1011.84003,  -2.2, 1011.84003, 77.6, -1., 0.0125, -1., -1., 9.8765e 35, 9.8765e 35, 2.11077e 22, 8.61874e 19, 8.72151e 21, 5.33405e 19, 9.01945e 21, 7.07619e 19),
           ('ny20180920s0i06c.0491', 2018.72160282, 263.38504,  9.2407, 1.235e 03, 78.923, 11.923, 0.024, 0.034, 79.177, 151.62399, -1.735, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,   0. , 1006.65997, 62.8, -1., 0.0095, -1., -1., 9.8765e 35, 9.8765e 35, 1.96888e 22, 7.48627e 19, 8.70719e 21, 5.40175e 19, 8.97596e 21, 7.49834e 19),
           ('ny20180920s0i06c.0492', 2018.72161188, 263.38834,  9.3201, 1.236e 03, 78.923, 11.923, 0.024, 0.034, 79.072, 152.83299, -1.729, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,  -0.6, 1006.65997, 64.6, -1., 0.0078, -1., -1., 9.8765e 35, 9.8765e 35, 1.94867e 22, 7.83111e 19, 8.71765e 21, 4.97304e 19, 8.97784e 21, 7.23055e 19)],
          dtype=[('spectrum', '<U21'), ('year', '<f8'), ('day', '<f8'), ('hour', '<f8'), ('run', '<f8'), ('lat', '<f8'), ('long', '<f8'), ('zobs', '<f8'), ('zmin', '<f8'), ('solzen', '<f8'), ('azim', '<f8'), ('osds', '<f8'), ('opd', '<f8'), ('fovi', '<f8'), ('amal', '<f8'), ('graw', '<f8'), ('tins', '<f8'), ('pins', '<f8'), ('tout', '<f8'), ('pout', '<f8'), ('hout', '<f8'), ('sia', '<f8'), ('fvsi', '<f8'), ('wspd', '<f8'), ('wdir', '<f8'), ('luft', '<f8'), ('luft_error', '<f8'), ('h2o', '<f8'), ('h2o_error', '<f8'), ('co2', '<f8'), ('co2_error', '<f8'), ('3co2', '<f8'), ('3co2_error', '<f8')])

vswhr1.shape 
(1236,)

*irrelevant numbers

I need to delete the las 2 variables ('3co2', '<f8'), ('3co2_error', '<f8')

Thank you

CodePudding user response：

If you are loading these arrays from csv files, then using usecols to select which columns you load may be the easiest way to get two arrays that match in dtype.

But it is also possible to select a subset of fields from an existing array.

To illustrate:

In [1]: dt1 = np.dtype('U10,i,f')
In [2]: dt2 = np.dtype('U10,i,f,i,i')
In [3]: x = np.ones(2,dtype=dt1)
In [4]: y = np.zeros(2,dtype=dt2)
In [5]: x
Out[5]: 
array([('1', 1, 1.), ('1', 1, 1.)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4')])
In [6]: y
Out[6]: 
array([('', 0, 0., 0, 0), ('', 0, 0., 0, 0)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<i4')])

A subset of the the y fields:

In [7]: y[['f0','f1','f2']]
Out[7]: 
array([('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

There are some complications in this view, as evidenced by the offsets parameter in the new dtype. The structured arrays doc page discusses this. Sometimes it's necessary to make a copy using the recfunctions.repack function.

But it appears that the view is just fine when used in concatenate:

In [8]: np.concatenate((x,y[['f0','f1','f2']]))
Out[8]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

We could also get the indexing list from the other array's dtype:

In [9]: x.dtype.names
Out[9]: ('f0', 'f1', 'f2')

That's a tuple, which we need to convert to a list:

In [13]: np.concatenate((x,y[list(x.dtype.names)]))
Out[13]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

(often in Python lists and tuples are interchangeable, but in numpy indexing they are interpreted in different ways, so the distinction is important.)