I have a dataframe with multiple arrays in a column. I am trying to find the min and max of each column.
I can split an array with a single value and find the min and max from there ([9.75 x 8.00]
) but it becomes more complicated when there are multiple values in an array ([10 x 14, 10 x 11, 1.5 x 1.75]
).
Here is what I have:
df[['Length', 'Width']]=df['Array'].str.split("x",expand=True,)
df['Max'] = df[['Length', 'Width']].values.max(1)
Here Is what I am trying to get:
df = pd.DataFrame({"Array": ["[9.75 x 8.00]", "[10 x 14, 10 x 11, 1.5 x 1.75]", "[54 x 80, 39 x 75, 78 x 80, 39 x 80, 54 x 75, 60 x 80]"],"Max":["[9.75]", "[14,11,1.75]", "[80,75,80,80,75,80]"],"Min":["[8.00]", "[10,10,1.5]", "[54,39,78,39,54,60]"]})
CodePudding user response:
The difficult part is to strip and split your strings:
df1 = df['Array'].str.strip('[]').str.split(', ').explode().str.split(' x ', expand=True)
df[['Max', 'Min']] = pd.concat([df1.max(1).groupby(level=0).apply(list),
df1.min(1).groupby(level=0).apply(list)], axis=1)
Output:
Array | Max | Min |
---|---|---|
[9.75 x 8.00] | [9.75] | [8.0] |
[10 x 14, 10 x 11, 1.5 x 1.75] | [14.0, 11.0, 1.75] | [10.0, 10.0, 1.5] |
[54 x 80, 39 x 75, 78 x 80, 39 x 80, 54 x 75, 60 x 80] | [80.0, 75.0, 80.0, 80.0, 75.0, 80.0] | [54.0, 39.0, 78.0, 39.0, 54.0, 60.0] |