I'm currently working with a data frame.
The year goes all the way to 2022. What I'm trying to do is create a new dataframe that has two columns.
1)Year 2)Average of the 'extent' column for those corresponding years.
So to illustrate-
Year | Average Extent |
---|---|
1978 | 12.487 |
1979 | 12.320 |
and so on until 2022.
I tried using pandas .groupby method and some others such as
new_df = df[['year','Extent...']].copy()
but I'm unsure how to group the rows for each year and give the average yearly 'extent' value for those corresponding years.
I hope this makes sense.
Any advice/tips will be appreciated. Thanks a lot!
Thanks a lot!
CodePudding user response:
This should run out of the box:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Shambok97/Sample-Data-/main/data.csv")
# Removing whitespace at the ends of the column names for convenience
df.columns = df.columns.str.strip()
out = df.groupby("Year")["Extent (10^6 sq km)"].mean()
out:
Year
1978 12.48700
1979 12.31956
Name: Extent (10^6 sq km), dtype: float64
CodePudding user response:
As you mentioned in the question, try this piece of code for desired output
import pandas as pd
df = pd.read_csv(r'your_data_path/data.csv')
# group by year and get the average sell for each year
df_grouped = df.groupby('Year')['Extent (10^6 sq km)'].mean()
# convert the grouped dataframe to a new dataframe
new_df = pd.DataFrame(df_grouped).reset_index()
# Rename the column
final_result = new_df.columns = ['Year', 'Average Extent']
final_result #see the desired output
Alternatively, you can also use the agg
function to get the Average Extent for each year and then use the reset_index()
function to convert the result to a dataframe:
final_result = df.groupby('Year').agg({'Extent (10^6 sq km)':'mean'}).reset_index()
final_result.columns = ['Year', 'Average Extent']
This will give you the same output as above.