How to group rows in pandas without groupby?-CodePudding

I have the following Pandas DataFrame and I am trying to group animals according to their class. I know I can use groupby to get a faster result. However, I was thinking if there was a way to replicate the groupby function by iterating over the rows.

df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
('bird', 'Psittaciformes', 24.0),
('mammal', 'Carnivora', 80.2),
('mammal', 'Primates', np.nan),
('mammal', 'Carnivora', 58)],
index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
columns=('class', 'order', 'max_speed'))

I have been trying to use the following code but it doesn't work, and I can't find another method.

birds = []
mammal = []
for i, columnclass in df.iterrows():
  if i == 'bird':
    birds.append(i)
  else:
    mammal.append(i) 
print(birds)
print(mammal)

CodePudding user response：

You don't really need a loop for any of this. First get a list of the unique elements:

classes = df['class'].unique()

Now you can make a dictionary or whatever you want out of it:

data = {cls: df['class'] == cls for cls in classes}

Or the one-liner:

data = {cls: df['class'] == cls for cls in df['class'].unique()}

But why do something like this when you can just use groupby?

CodePudding user response：

The iterrows method of the data frame returns a 2-tuple containing (index, series of the row data indexed by the column names). This is a quote from pandas documentation:

DataFrame.iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

you need to access the class column of each row. You can do that with:

birds = []
mammal = []
for i, (columnclass, _, _) in df.iterrows():
    if columnclass == "bird":
        birds.append(i)
    else:
        mammal.append(i)
print(birds)
print(mammal)