I am practicing on the manufacturing data set by the St. Louis Fed. Here, I would like to find out how many months it took for the peak in 2008 to be reached again. To do that, I have written the following code:
# Set DATE as index and convert to datetime
df.set_index("DATE", inplace = True)
df.index = pd.to_datetime(df.index)
# Locate the date of the peak in 2008 and find out how high the peak was
maxdate = df.loc["2008-01-01":"2008-12-31"].idxmax()
maxvalue = df.loc[maxdate]["UMTMVS"]
#Create new data frame that encompasses the records after maxdate
afterpeak = df.loc[maxdate:]
# Create new data frame that encompasses all records in which the daily value was larger than the maxvalue of 2008
df2 = afterpeak[afterpeak>= maxvalue].dropna()
# Create new data frame that has the second instant in which the daily value was higher than maxvalue of 2008 (first value is maxdate itself)
samelevel = df[1]
# Count number of months between maxdate and second instant in which the daily value was higher than maxvalue of 2008
len(df2.loc[maxdate:samelevel])
While maxdate and maxvalue work perfectly fine, I am having trouble with the next lines. I seem to be unable to parse in maxdate into df.loc[maxdate:] even though parsing in maxdate worked perfectly fine to generated maxvalue. However, df.loc[maxdate:] results in the error message "cannot do slice indexing on DatetimeIndex with these indexers [UMTMVS 2008-06-01 dtype: datetime64[ns]] of type Series"
I researched a bit here on stackoverflow and tried using
maxdate_str = maxdate.index.strftime('%Y-%m-%d')
afterpeak = df.loc[maxdate_str:]
but this, too, generates an error ('Index' object has no attribute 'strftime').
Can someone please help me figuring out what the issue here is?
CodePudding user response:
To do this, you need to extract the value, since maxdate is a series.
print(maxdate)
Output
UMTMVS 2008-06-01
Get the value:
print(maxdate[0])
Output
2008-06-01 00:00:00
get the desired segment:
afterpeak = df.loc[maxdate[0]:]
print(afterpeak)
And note that loc takes the slice inclusively. That is, in this case, maxvalue in 2008-06-01 and when slicing df.loc[maxdate[0]:] this value will also be extracted. Therefore, for verification, we use iloc, in which you can use indexes implicitly. In this example, we need to skip the first value since it's the maxvalue.
print(afterpeak[afterpeak.iloc[1:] >= maxvalue[0]].dropna())