I have the following Pandas DataFrame and I am trying to group animals according to their class. I know I can use groupby to get a faster result. However, I was thinking if there was a way to replicate the groupby function by iterating over the rows.
df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
('bird', 'Psittaciformes', 24.0),
('mammal', 'Carnivora', 80.2),
('mammal', 'Primates', np.nan),
('mammal', 'Carnivora', 58)],
index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
columns=('class', 'order', 'max_speed'))
I have been trying to use the following code but it doesn't work, and I can't find another method.
birds = []
mammal = []
for i, columnclass in df.iterrows():
if i == 'bird':
birds.append(i)
else:
mammal.append(i)
print(birds)
print(mammal)
CodePudding user response:
You don't really need a loop for any of this. First get a list of the unique elements:
classes = df['class'].unique()
Now you can make a dictionary or whatever you want out of it:
data = {cls: df['class'] == cls for cls in classes}
Or the one-liner:
data = {cls: df['class'] == cls for cls in df['class'].unique()}
But why do something like this when you can just use groupby
?
CodePudding user response:
The iterrows
method of the data frame returns a 2-tuple containing (index, series of the row data indexed by the column names). This is a quote from pandas documentation:
DataFrame.iterrows()
Iterate over DataFrame rows as (index, Series) pairs.
you need to access the class
column of each row. You can do that with:
birds = []
mammal = []
for i, (columnclass, _, _) in df.iterrows():
if columnclass == "bird":
birds.append(i)
else:
mammal.append(i)
print(birds)
print(mammal)