I'm not able to figure out the error - AttributeError: 'numpy.int64' object has no attribute 'startswith' generated from the line - "year_indices = [x for x in sum_series if x.startswith("emissions")]"
import pandas as pd
URL = "https://www.epa.gov/sites/default/files/2021-03/state_tier1_caps.xlsx"
df = pd.read_excel(URL, sheet_name="State_Trends", header=1)
def minmaxyear(subdf):
sum_series = subdf.sum()
year_indices = [x for x in sum_series if x.startswith("emissions")]
minyear = sum_series[year_indices].astype(float).idxmin()
maxyear = sum_series[year_indices].astype(float).idxmax()
return pd.Series({"min year": minyear[-2:], "max year": maxyear[-2:]})
df_years = df[df["Pollutant"]=="CO"].groupby("State").apply(minmaxyear)
print(df_years)
CodePudding user response:
Since x is an integer, in order to do a use a string function on it (startswith
), you should first convert it to str
:
year_indices = [x for x in sum_series if str(x).startswith("emissions")]
CodePudding user response:
@lytseeker's answer does solve the error you were getting but it returns an empty series. Meaning the lines getting the minyear
and maxyear
would throw an error.
This is because it iterates through the values of sum_series
and not the indices.
So do this instead.
year_indices = [x for x in sum_series.index if x.startswith("emissions")]
After making this change I was able to get this output -
min year max year
State
AK 07 02
AL 21 90
AR 21 90
AZ 21 90
CA 21 90
CO 16 90
CT 21 90
DC 17 90
DE 21 90
FL 17 01
GA 17 90
CodePudding user response:
You can pass mask for compare indices of Series
:
def minmaxyear(subdf):
sum_series = subdf.sum()
year = sum_series[sum_series.index.str.startswith("emissions")].astype(float)
minyear = year.idxmin()
maxyear = year.idxmax()
return pd.Series({"min year": minyear[-2:], "max year": maxyear[-2:]})
df_years = df[df["Pollutant"]=="CO"].groupby("State").apply(minmaxyear)
print(df_years.head())
min year max year
State
AK 07 02
AL 21 90
AR 21 90
AZ 21 90
CA 21 90