How do I average row values using Python and create a new data frame?-CodePudding

I'm currently working with a data frame.

The year goes all the way to 2022. What I'm trying to do is create a new dataframe that has two columns.

1)Year 2)Average of the 'extent' column for those corresponding years.

So to illustrate-

Year	Average Extent
1978	12.487
1979	12.320

and so on until 2022.

I tried using pandas .groupby method and some others such as

new_df = df[['year','Extent...']].copy()

but I'm unsure how to group the rows for each year and give the average yearly 'extent' value for those corresponding years.

I hope this makes sense.

Any advice/tips will be appreciated. Thanks a lot!

Thanks a lot!

CodePudding user response：

This should run out of the box:

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/Shambok97/Sample-Data-/main/data.csv")

# Removing whitespace at the ends of the column names for convenience
df.columns = df.columns.str.strip()

out = df.groupby("Year")["Extent (10^6 sq km)"].mean()

out:

Year
1978    12.48700
1979    12.31956
Name: Extent (10^6 sq km), dtype: float64

CodePudding user response：

As you mentioned in the question, try this piece of code for desired output

import pandas as pd
df = pd.read_csv(r'your_data_path/data.csv')
# group by year and get the average sell for each year
df_grouped = df.groupby('Year')['Extent (10^6 sq km)'].mean()
# convert the grouped dataframe to a new dataframe
new_df = pd.DataFrame(df_grouped).reset_index()
# Rename the column 
final_result = new_df.columns = ['Year', 'Average Extent']
final_result #see the desired output

Alternatively, you can also use the agg function to get the Average Extent for each year and then use the reset_index() function to convert the result to a dataframe:

final_result = df.groupby('Year').agg({'Extent (10^6 sq km)':'mean'}).reset_index()
final_result.columns = ['Year', 'Average Extent']

This will give you the same output as above.