I need to concatenate 2 rec.arrays (same procedure I do for all other in my work). Problem I have is one of the documents I read for the array, has 2 extra variables that I need to remove to match the variables of the other array to concatenate. I have tried several things, like using the index to remove, all lead to error.
This is the array
vswhr1
rec.array([('ny20110325s0a06c.001', 2011.23149798, 84.49677, 11.9223, 1.000e 00, 78.923, 11.923, 0.024, 0.024, 77.286, 189.465 , 1.688, 180. , 0.0019, 0., 0.00167, 60., 1003.84003, -15.7, 1003.84003, 65.8, -1., 0. , -1., -1., 9.8765e 35, 9.8765e 35, 5.96541e 21, 2.60898e 19, 8.45080e 21, 7.92632e 19, 8.74633e 21, 8.68890e 19),
('ny20110325s0a06c.002', 2011.23150704, 84.50007, 12.0017, 2.000e 00, 78.923, 11.923, 0.024, 0.024, 77.325, 190.686 , 1.694, 180. , 0.0019, 0., 0.00167, 60., 1003.83002, -16. , 1003.83002, 68.7, -1., 0. , -1., -1., 9.8765e 35, 9.8765e 35, 5.93553e 21, 2.54199e 19, 8.43518e 21, 7.75936e 19, 8.72990e 21, 8.60191e 19),
('ny20110325s0a06c.003', 2011.23150736, 84.50019, 12.0045, 3.000e 00, 78.923, 11.923, 0.024, 0.024, 77.326, 190.728 , 1.694, 180. , 0.0019, 0., 0.00167, 60., 1003.83002, -16.1, 1003.83002, 68.9, -1., 0. , -1., -1., 9.8765e 35, 9.8765e 35, 5.93643e 21, 2.59443e 19, 8.42675e 21, 8.17653e 19, 8.73537e 21, 8.68880e 19),
...,
('ny20180919s0i06c.0042', 2018.71887239, 262.38843, 9.3221, 1.234e 03, 78.923, 11.923, 0.024, 0.027, 78.69 , 152.737 , -1.722, 180.00999, 0.0019, 0., 0.00188, 60., 1011.84003, -2.2, 1011.84003, 77.6, -1., 0.0125, -1., -1., 9.8765e 35, 9.8765e 35, 2.11077e 22, 8.61874e 19, 8.72151e 21, 5.33405e 19, 9.01945e 21, 7.07619e 19),
('ny20180920s0i06c.0491', 2018.72160282, 263.38504, 9.2407, 1.235e 03, 78.923, 11.923, 0.024, 0.034, 79.177, 151.62399, -1.735, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997, 0. , 1006.65997, 62.8, -1., 0.0095, -1., -1., 9.8765e 35, 9.8765e 35, 1.96888e 22, 7.48627e 19, 8.70719e 21, 5.40175e 19, 8.97596e 21, 7.49834e 19),
('ny20180920s0i06c.0492', 2018.72161188, 263.38834, 9.3201, 1.236e 03, 78.923, 11.923, 0.024, 0.034, 79.072, 152.83299, -1.729, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997, -0.6, 1006.65997, 64.6, -1., 0.0078, -1., -1., 9.8765e 35, 9.8765e 35, 1.94867e 22, 7.83111e 19, 8.71765e 21, 4.97304e 19, 8.97784e 21, 7.23055e 19)],
dtype=[('spectrum', '<U21'), ('year', '<f8'), ('day', '<f8'), ('hour', '<f8'), ('run', '<f8'), ('lat', '<f8'), ('long', '<f8'), ('zobs', '<f8'), ('zmin', '<f8'), ('solzen', '<f8'), ('azim', '<f8'), ('osds', '<f8'), ('opd', '<f8'), ('fovi', '<f8'), ('amal', '<f8'), ('graw', '<f8'), ('tins', '<f8'), ('pins', '<f8'), ('tout', '<f8'), ('pout', '<f8'), ('hout', '<f8'), ('sia', '<f8'), ('fvsi', '<f8'), ('wspd', '<f8'), ('wdir', '<f8'), ('luft', '<f8'), ('luft_error', '<f8'), ('h2o', '<f8'), ('h2o_error', '<f8'), ('co2', '<f8'), ('co2_error', '<f8'), ('3co2', '<f8'), ('3co2_error', '<f8')])
vswhr1.shape
(1236,)
*irrelevant numbers
I need to delete the las 2 variables ('3co2', '<f8'), ('3co2_error', '<f8')
Thank you
CodePudding user response:
If you are loading these arrays from csv files, then using usecols
to select which columns you load may be the easiest way to get two arrays that match in dtype
.
But it is also possible to select a subset of fields from an existing array.
To illustrate:
In [1]: dt1 = np.dtype('U10,i,f')
In [2]: dt2 = np.dtype('U10,i,f,i,i')
In [3]: x = np.ones(2,dtype=dt1)
In [4]: y = np.zeros(2,dtype=dt2)
In [5]: x
Out[5]:
array([('1', 1, 1.), ('1', 1, 1.)],
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4')])
In [6]: y
Out[6]:
array([('', 0, 0., 0, 0), ('', 0, 0., 0, 0)],
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<i4')])
A subset of the the y
fields:
In [7]: y[['f0','f1','f2']]
Out[7]:
array([('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
There are some complications in this view
, as evidenced by the offsets
parameter in the new dtype. The structured arrays
doc page discusses this. Sometimes it's necessary to make a copy
using the recfunctions.repack
function.
But it appears that the view
is just fine when used in concatenate
:
In [8]: np.concatenate((x,y[['f0','f1','f2']]))
Out[8]:
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
We could also get the indexing list from the other array's dtype
:
In [9]: x.dtype.names
Out[9]: ('f0', 'f1', 'f2')
That's a tuple, which we need to convert to a list:
In [13]: np.concatenate((x,y[list(x.dtype.names)]))
Out[13]:
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})
(often in Python lists and tuples are interchangeable, but in numpy
indexing they are interpreted in different ways, so the distinction is important.)