I'm using python 3.x. I have a numpy array of shape (29982,29982) & a list of shape 29982. The sample array looks like
array([[1,5,7,2,9...],
[2,6,4,1,5...],
[7,9,1,12,4...],
...
...
[6,8,13,2,4...]])
The sample list looks like
['John','David','Josua',......,'Martin']
I would like to get a pandas dataframe combining this array & list such that array value should be greater than 5. The dataframe should look like
'John' 'David' 'Josua'
'John' 0 0 7
'David' 0 6 0
'Josua' 7 9 0
....
'Martin' 6 8 13
Can you please suggest me how should I do it?
CodePudding user response:
Just create the dataframe from the array with pd.DataFrame
, passing your list as index
and columns
. Then use df.where
to keep only values that are greater than 5:
arr = [...]
lst = ['John','David','Josua',...,'Martin']
df = pd.DataFrame(arr, index=lst, columns=lst)
df = df.where(df > 5, 0)
Output:
CodePudding user response:
You can try numpy.ma.masked_where
to process on numpy array
arr = np.array([[1,5,7,2,],
[2,6,4,1,],
[7,9,1,12],
[6,8,13,2]])
lst = ['John','David','Josua', 'Martin']
df = pd.DataFrame(np.ma.masked_where(arr<=5, arr).filled(0), index=lst, columns=lst)
print(df)
John David Josua Martin
John 0 0 7 0
David 0 6 0 0
Josua 7 9 0 12
Martin 6 8 13 0