I'm trying to consolidate this down into a for loop. Trying to have a separate dataframe for each age, sort each dataframe, calculate the mean of one of the columns, and save the dataframe to an excel file

df21=df[~df['Age'].isin(['22','23','24','25'])]
df21=df21.sort_values(by='FPS/G',ascending=False)
mean21=df21["FPS/G"].mean()
df21.to_excel(writer, 'Age 21')

df22=df[~df['Age'].isin(['21','23','24','25'])]
df22=df22.sort_values(by='FPS/G',ascending=False)
mean22=df22["FPS/G"].mean()
df22.to_excel(writer, 'Age 22')

df23=df[~df['Age'].isin(['21','22','24','25'])]
df23=df23.sort_values(by='FPS/G',ascending=False)
mean23=df23["FPS/G"].mean()
df23.to_excel(writer, 'Age 23')

df24=df[~df['Age'].isin(['21','22','23','25'])]
df24=df24.sort_values(by='FPS/G',ascending=False)
mean24=df24["FPS/G"].mean()
df24.to_excel(writer, 'Age 24')

df25=df[~df['Age'].isin(['21','22','23','24'])]
df25=df25.sort_values(by='FPS/G',ascending=False)
mean25=df25["FPS/G"].mean()
df25.to_excel(writer, 'Age 25')

CodePudding user response：

How to convert repetitive code into a loop:

What are we iterating (looping) over?

Each time through our process, we handle a different possible age value. These are apparently strings, chosen from '21', '22', '23', '24' and '25'.

What is different each time through the process?

The variable names for the temporary dataframe; but that does not matter - they are temporary; we do not care about the value of df21 etc. after the process, we only care about writing the files. So we can reuse a variable for this.
The variable name for the mean. We do not want to try to create a different variable each time through the loop - instead, we will collect the data in some kind of data structure. Here, a dict is the natural choice.
The label used for the to_excel method call.
The ages filtered out using .isin.

What is the same each time?

Write out the shell of that, using comments and temporary variables to mark the parts from step 1. We will use more generic variable names for the ones that we said did not matter. We will also write the dict logic now.

That gives us:

# First, we set up an empty dictionary for the means:
means = {}

# Each time through the loop, we will do something like:
age = # <-- we will need something that changes each time, representing
# the age we are currently processing.
other_ages = # ??? we need to base this off the `age` somehow.
age_df = df[~df['Age'].isin(other_ages)]
age_df = age_df.sort_values(by='FPS/G',ascending=False)
means[age] = age_df["FPS/G"].mean()
label = # ??? we need to base this off the `age` somehow.
age_df.to_excel(writer, label)

What is the missing logic?

age is just our iteration variable, so we don't need to assign it - we are going to use a for loop, and it will come from that loop automatically. We will put the possible values into a tuple (or list), which is easy to iterate over. Let's call that ages.
The other_ages needs to be a list that looks like the complete list of ages, but with age removed. We can get that for example by making a list copy of ages and .removeing the value; or we can use a list comprehension to make a filtered copy (the approach I will show here).
The label needs to be text that starts with 'Age ' and ends with the age string. We can easily create this with string formatting.

Inserting those bits of logic, we get the final result:

means = {}
ages = ('21', '22', '23', '24', '25')
for age in ages:
    other_ages = [a for a in ages if a != age]
    age_df = df[~df['Age'].isin(other_ages)]
    age_df = age_df.sort_values(by='FPS/G',ascending=False)
    means[age] = age_df["FPS/G"].mean()
    label = f'Age {age}'
    age_df.to_excel(writer, label)

And of course we can put the new temporary calculations in-line if we prefer:

means = {}
ages = ('21', '22', '23', '24', '25')
for age in ages:
    age_df = df[~df['Age'].isin([a for a in ages if a != age])]
    age_df = age_df.sort_values(by='FPS/G',ascending=False)
    means[age] = age_df["FPS/G"].mean()
    age_df.to_excel(writer, f'Age {age}')

CodePudding user response：

Loop over a list of age strings. Use that when filtering to create the df for a specific age that you write to the Excel file. And put the means into a dictionary keyed by age, rather than separate variables.

ages = ['21', '22', '23', '24', '25']
means = {}

for age in ages:
    age_df = df[df['Age'] == age].sort_values(by='FPS/G',ascending=False)
    means[age] = age_df["FPS/G"].mean()
    age_df.to_excel(writer, f'Age {age}')