Home > OS >  Retrieve value larger than a threshold value from numpy array optimally
Retrieve value larger than a threshold value from numpy array optimally

Time:05-30

I'm using python 3.x. I have a numpy array of shape (29982,29982) & a list of shape 29982. The sample array looks like

array([[1,5,7,2,9...],
       [2,6,4,1,5...],
       [7,9,1,12,4...],
       ...
       ...
       [6,8,13,2,4...]])

The sample list looks like

['John','David','Josua',......,'Martin']

I would like to get a pandas dataframe combining this array & list such that array value should be greater than 5. The dataframe should look like

        'John'  'David'   'Josua'
'John'    0       0         7
'David'   0       6         0
'Josua'   7       9         0
....
'Martin'  6       8         13

Can you please suggest me how should I do it?

CodePudding user response:

Just create the dataframe from the array with pd.DataFrame, passing your list as index and columns. Then use df.where to keep only values that are greater than 5:

arr = [...]
lst = ['John','David','Josua',...,'Martin']

df = pd.DataFrame(arr, index=lst, columns=lst)
df = df.where(df > 5, 0)

Output:

CodePudding user response:

You can try numpy.ma.masked_where to process on numpy array

arr = np.array([[1,5,7,2,],
                [2,6,4,1,],
                [7,9,1,12],
                [6,8,13,2]])

lst = ['John','David','Josua', 'Martin']

df = pd.DataFrame(np.ma.masked_where(arr<=5, arr).filled(0), index=lst, columns=lst)
print(df)

        John  David  Josua  Martin
John       0      0      7       0
David      0      6      0       0
Josua      7      9      0      12
Martin     6      8     13       0
  • Related