Home > Software engineering >  Returning a max integer value from a pandas list of string numbers
Returning a max integer value from a pandas list of string numbers

Time:12-14

I have a df like this where df['value_list'] has dtype object:

                     value_list  
0             ['200000.00', '100.00', '25.00']                 
1                       ['860000.00']                          
2                  ['148000.00', '25.00']                      

I want a new column max_value which will be the max integer numerical value of the string numbers inside this list. So expected results for above:

                     value_list                              max_value
0             ['200000.00', '100.00', '25.00']                 200000
1                       ['860000.00']                          860000
2                  ['148000.00', '25.00']                      148000

What I did:

# covert strings to numerical values
df['value_list'] = df['value_list'].apply(lambda x: ast.literal_eval(x))
# get max values from list and if list is empty temporarily assign -1
df['max_value'] = df['value_list'].apply(lambda x: max(x) if len(x) >= 1 else -1)
df['max_value'] = df['max_value'].apply(lambda x: round(float(x)) if x != -1 else None)

Which seems to have worked but I didn't actually get the max values. The results I got:

                     value_list                              max_value
0             ['200000.00', '100.00', '25.00']                  25
1                       ['860000.00']                          860000
2                  ['148000.00', '25.00']                       25

Is there a better and more correct way to do this? Please advise.

CodePudding user response:

One approach:

import pandas as pd

# input toy data 
df = pd.DataFrame([[['200000.00', '100.00', '25.00']],
                   [['860000.00']],
                   [['148000.00', '25.00']]], columns=["value_list"])

df["max_value"] = df["value_list"].apply(lambda x: max(float(e) for e in x))
print(df)

Output

                   value_list  max_value
0  [200000.00, 100.00, 25.00]   200000.0
1                 [860000.00]   860000.0
2          [148000.00, 25.00]   148000.0

If you have empty lists and want to return -1, just do:

df["max_value"] = df["value_list"].apply(lambda x: max(float(e) for e in x) if x else -1)
  • Related