I’m trying to extract a list of values from a column in a dataframe.
For example:
# dataframe with "num_fruit" column
fruit_df = pd.DataFrame({"num_fruit": ['1 "Apple"',
'100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']})
# desired output: a list of values from the "num_fruit" column
[['1 "Apple"'],
['100 "Peach Juice3"', '1234 "Not_fruit"', '23 "Straw-berry"', '2 "Orange"']]
Any suggestions? Thanks a lot.
What I’ve tried:
import re
def split_fruit_val(val):
return re.findall('(\d ". ")', val)
result_list = []
for val in fruit_df['num_fruit']:
result = split_fruit_val(val)
result_list.append(result)
print(result_list)
#output: some values were not split appropriately
[['1 "Apple"'],
['100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']]
CodePudding user response:
Lets split
with positive lookahead for a number
fruit_df['num_fruit'].str.split(r'\s(?=\d )')
0 [1 "Apple"]
1 [100 "Peach Juice3", 1234 "Not_fruit", 23 "Str...
Name: num_fruit, dtype: object