Home > OS >  Replacing values in 2d numpy array based on 1d numpy array or list
Replacing values in 2d numpy array based on 1d numpy array or list

Time:05-26

Consider the following 2d numPy array:

import numpy as np

daily = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'group'],  
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'group'],
], dtype = object)

daily

And here is the other 1d numPy array (this could be list if needed):

campaigns = np.array([111, 333], dtype = object)
campaigns

What is the fastest way to replace the last column values from 'group' into 'new' or 'old' depending on whether the values from the campaigns exist or not? The way I was able to do it with python for loop if statements is very slow for the final goal. The final go is to check several billion combinations of new/old so we need something very quick.

%%time
for x in daily:
    if x[4] in campaigns:
        x[7] = 'new'
    else:
        x[7] = 'old'
daily

And here is the expected result:

result = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'new'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'old'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'old'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'old'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'new']
], dtype=object)

result

CodePudding user response:

The whole 4 column:

In [58]: daily[:,4]
Out[58]: array([111, 222, 333, 111, 222, 333, 111, 222, 333], dtype=object)

We can match it with campaigns with:

In [60]: np.in1d(daily[:,4],campaigns)
Out[60]: array([ True, False,  True,  True, False,  True,  True, False,  True])

In [62]: mask = np.in1d(daily[:,4],campaigns)

In [63]: daily[mask,7]
Out[63]: array(['group', 'group', 'group', 'group', 'group', 'group'], dtype=object)

where lets us convert that to an array of strings:

In [67]: np.where(mask, 'new','old')
Out[67]: 
array(['new', 'old', 'new', 'new', 'old', 'new', 'new', 'old', 'new'],
      dtype='<U3')

Which we can assign to the 7 column:

In [68]: daily[:,7] = _

I see lots of pandas questions about using np.where in the same sort of way.

  • Related