Retrieve column with condition in a Panda table-CodePudding

I am a student in high school and I have an exercise to hand in that I have been stuck on for several hours already.

Here are the instructions:

Once you have a new DataFrame with only the data from 1952 and 2007, keep only the columns of interest (year and lifeExp), group the data by year, and calculate the average life expectancy. Once this is done, use the function df.plot.bar()

Help: you can use several .loc[] in a row, once to select the years, and once to select the columns of interest. Look at the example below to help you.

And here is my code :

ax = (df
      .loc[df['year'] == '2007', ['lifeExp']]
      .groupby(df['year'] == '2007')
      .mean('lifeExp')
      .plot.bar(rot=45, figsize=(16, 8))
      )

ax.set_title('Espérance de vie par année', fontsize=14);
ax.set_ylabel('Espérance de vie', fontsize=12);

I have to keep the code similar to this and I don't have to use another way.

Could you help me find a solution please?

PS: I'm new to the forum, feel free to tell me if I did something wrong

CodePudding user response：

You code has many incorrect parts:

ax = (df
      .loc[df['year'] == '2007', ['lifeExp']] # you keep only 2007
      .groupby(df['year'] == '2007')       # you group by 2007 or non-2007
      .mean('lifeExp')                     # this is an invalid parameter
      .plot.bar(rot=45, figsize=(16, 8))
      )

This should be:

ax = (df
      .loc[df['year'].between(1952, 2007), ['year', 'lifeExp']]
      .groupby('year')
      .mean()
      .plot.bar(rot=45, figsize=(16, 8))
      )

Or, if the indices are unique:

ax = (df
      .loc[df['year'].between(1952, 2007), ['lifeExp']]
      .groupby(df['year'])
      .mean()
      .plot.bar(rot=45, figsize=(16, 8))
      )