I was trying to find the maximum value of a column in a dataframe that contains numpy arrays.
df = pd.DataFrame({'id': [1, 2, 33, 4],
'a': [1, 22, 23, 44],
'b': [1, 42, 23, 42]})
df['new'] = df.apply(lambda r: tuple(r), axis=1).apply(np.array)
This how the dataframe can look like:
id a b new
0 1 1 1 [1, 1, 1]
1 2 22 42 [2, 22, 42]
2 33 23 23 [33, 23, 23]
3 4 44 42 [4, 44, 42]
Now I want to find the maximum (single) value of column new. In this case it is 44. What about a quick and easy way?
CodePudding user response:
Because your new
column is actually constructed from the columns id
, a
, b
. Before you create the new
column you can do:
single_max = np.max(df.values)
OR if you insist on your dataframe to contain the new
column and then get max you can do:
single_max = np.max(df.drop('new',axis=1).values)
CodePudding user response:
You can apply a lambda to the values that calls the array's max
method. This would result in a Series that also has a max
method.
df['new'].apply(lambda arr: arr.max()).max()
Just guessing, but this should be faster than .apply(max)
because you use the optimized array method instead of converting the numpy ints to python ints one by one.
CodePudding user response:
A possible solution:
df.new.explode().max()
Or a faster alternative:
np.max(np.vstack(df.new.values))
Returns 44
.
CodePudding user response:
Assuming you only want to consider the columns "new":
import numpy as np
out = np.max(tuple(df['new'])) # or np.max(df['new'].tolist())
Output: 44