Home > Software design >  Extract a list of values from a column in a pandas dataframe
Extract a list of values from a column in a pandas dataframe

Time:01-20

I’m trying to extract a list of values from a column in a dataframe.

For example:

# dataframe with "num_fruit" column 
fruit_df = pd.DataFrame({"num_fruit": ['1 "Apple"', 
                                        '100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']})
# desired output: a list of values from the "num_fruit" column 
[['1 "Apple"'],
 ['100 "Peach Juice3"', '1234 "Not_fruit"', '23 "Straw-berry"', '2 "Orange"']]

Any suggestions? Thanks a lot.

What I’ve tried:

import re 

def split_fruit_val(val):
    return re.findall('(\d  ". ")', val)

result_list = []
for val in fruit_df['num_fruit']:
    result = split_fruit_val(val)
    result_list.append(result)

print(result_list) 
#output: some values were not split appropriately 
[['1 "Apple"'],
 ['100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']]

CodePudding user response:

Lets split with positive lookahead for a number

fruit_df['num_fruit'].str.split(r'\s(?=\d )')

0                                          [1 "Apple"]
1    [100 "Peach Juice3", 1234 "Not_fruit", 23 "Str...
Name: num_fruit, dtype: object
  • Related