I'm trying to consolidate this down into a for loop. Trying to have a separate dataframe for each age, sort each dataframe, calculate the mean of one of the columns, and save the dataframe to an excel file
df21=df[~df['Age'].isin(['22','23','24','25'])]
df21=df21.sort_values(by='FPS/G',ascending=False)
mean21=df21["FPS/G"].mean()
df21.to_excel(writer, 'Age 21')
df22=df[~df['Age'].isin(['21','23','24','25'])]
df22=df22.sort_values(by='FPS/G',ascending=False)
mean22=df22["FPS/G"].mean()
df22.to_excel(writer, 'Age 22')
df23=df[~df['Age'].isin(['21','22','24','25'])]
df23=df23.sort_values(by='FPS/G',ascending=False)
mean23=df23["FPS/G"].mean()
df23.to_excel(writer, 'Age 23')
df24=df[~df['Age'].isin(['21','22','23','25'])]
df24=df24.sort_values(by='FPS/G',ascending=False)
mean24=df24["FPS/G"].mean()
df24.to_excel(writer, 'Age 24')
df25=df[~df['Age'].isin(['21','22','23','24'])]
df25=df25.sort_values(by='FPS/G',ascending=False)
mean25=df25["FPS/G"].mean()
df25.to_excel(writer, 'Age 25')
CodePudding user response:
How to convert repetitive code into a loop:
What are we iterating (looping) over?
Each time through our process, we handle a different possible age value. These are apparently strings, chosen from '21'
, '22'
, '23'
, '24'
and '25'
.
What is different each time through the process?
The variable names for the temporary dataframe; but that does not matter - they are temporary; we do not care about the value of
df21
etc. after the process, we only care about writing the files. So we can reuse a variable for this.The variable name for the mean. We do not want to try to create a different variable each time through the loop - instead, we will collect the data in some kind of data structure. Here, a
dict
is the natural choice.The label used for the
to_excel
method call.The ages filtered out using
.isin
.
What is the same each time?
Write out the shell of that, using comments and temporary variables to mark the parts from step 1. We will use more generic variable names for the ones that we said did not matter. We will also write the dict
logic now.
That gives us:
# First, we set up an empty dictionary for the means:
means = {}
# Each time through the loop, we will do something like:
age = # <-- we will need something that changes each time, representing
# the age we are currently processing.
other_ages = # ??? we need to base this off the `age` somehow.
age_df = df[~df['Age'].isin(other_ages)]
age_df = age_df.sort_values(by='FPS/G',ascending=False)
means[age] = age_df["FPS/G"].mean()
label = # ??? we need to base this off the `age` somehow.
age_df.to_excel(writer, label)
What is the missing logic?
age
is just our iteration variable, so we don't need to assign it - we are going to use afor
loop, and it will come from that loop automatically. We will put the possible values into a tuple (or list), which is easy to iterate over. Let's call thatages
.The
other_ages
needs to be a list that looks like the complete list of ages, but withage
removed. We can get that for example by making a list copy ofages
and.remove
ing the value; or we can use a list comprehension to make a filtered copy (the approach I will show here).The
label
needs to be text that starts with'Age '
and ends with the age string. We can easily create this with string formatting.
Inserting those bits of logic, we get the final result:
means = {}
ages = ('21', '22', '23', '24', '25')
for age in ages:
other_ages = [a for a in ages if a != age]
age_df = df[~df['Age'].isin(other_ages)]
age_df = age_df.sort_values(by='FPS/G',ascending=False)
means[age] = age_df["FPS/G"].mean()
label = f'Age {age}'
age_df.to_excel(writer, label)
And of course we can put the new temporary calculations in-line if we prefer:
means = {}
ages = ('21', '22', '23', '24', '25')
for age in ages:
age_df = df[~df['Age'].isin([a for a in ages if a != age])]
age_df = age_df.sort_values(by='FPS/G',ascending=False)
means[age] = age_df["FPS/G"].mean()
age_df.to_excel(writer, f'Age {age}')
CodePudding user response:
Loop over a list of age strings. Use that when filtering to create the df for a specific age that you write to the Excel file. And put the means into a dictionary keyed by age, rather than separate variables.
ages = ['21', '22', '23', '24', '25']
means = {}
for age in ages:
age_df = df[df['Age'] == age].sort_values(by='FPS/G',ascending=False)
means[age] = age_df["FPS/G"].mean()
age_df.to_excel(writer, f'Age {age}')