I'm having a series in a DataFrame where it is in object format but I want to loop each row and add values to a single list without a list of lists.
df['Fruits']
index Fruits
0 ['banana']
1 ['apple','grapes(imported,us)','mango']
2 ['apple']
3 ['mango','grapes(imported,US)','pears(imported,NZ)']
4 ['mango']
dtype: object
fruits_list = []
for i in df['Fruits']:
fruits_list.append(i)
Expected Output:
fruits_list = ['banana', 'apple','grapes(imported,us)','mango', 'apple', 'mango','grapes(imported,US)','pears(imported,NZ)', 'mango']
CodePudding user response:
Why was your data in this format to begin with? What you had were strings, not lists of strings:
In [2]: for item in df["Fruits"]:
...: print(type(item), item)
...:
<class 'str'> ['banana']
<class 'str'> ['apple','grapes(imported,us)','mango']
<class 'str'> ['apple']
<class 'str'> ['mango','grapes(imported,US)','pears(imported,NZ)']
<class 'str'> ['mango']
So you can use ast.literal_eval()
to convert these strings to lists of strings, and then use list.extend()
to obtain a flattened list of all the items:
In [3]: import ast
In [4]: fruits = []
...: for item in df["Fruits"]:
...: fruits.extend(ast.literal_eval(item))
...:
In [5]: fruits
Out[5]:
['banana',
'apple',
'grapes(imported,us)',
'mango',
'apple',
'mango',
'grapes(imported,US)',
'pears(imported,NZ)',
'mango']
CodePudding user response:
Using and summarizing the following answer: How to make a flat list of lists.
Setup:
df = pd.DataFrame({"Fruits": [['banana'], ['apple','grapes(imported,us)','mango'], ['apple'], ['mango','grapes(imported,US)','pears(imported,NZ)'], ['mango']]})
I suggest two options:
Option 1 for small data sets ~10 elements:
sum(df['Fruits'], [])
Option 2 for larger data sets > 10 elements:
from itertools import chain
chain.from_iterable(df['Fruits'])
Output:
['banana',
'apple',
'grapes(imported,us)',
'mango',
'apple',
'mango',
'grapes(imported,US)',
'pears(imported,NZ)',
'mango']