Get the first and the last day of a month from the df-CodePudding

This is how my dataframe looks like:

datetime      open      high     low       close    
2006-01-02    4566.95   4601.35  4542.00   4556.25
2006-01-03    4531.45   4605.45  4531.45   4600.25  
2006-01-04    4619.55   4707.60  4616.05   4694.14
.
.
.

Need to calculate the Monthly Returns in %

Formula: (Month Closing Price - Month Open Price) / Month Open Price

I can't seem to get the open price and closing price of a month, because in my df most months dont have a log for the 1st of the month. So having trouble calculating it.

Any help would be very much appreciated!

CodePudding user response：

You need to use groupby and agg function in order to get the first and last value of each column in each month:

import pandas as pd
df = pd.read_csv("dt.txt")
df["datetime"] = pd.to_datetime(df["datetime"])
df.set_index("datetime", inplace=True)
resultDf = df.groupby([df.index.year, df.index.month]).agg(["first", "last"])
resultDf["new_column"] = (resultDf[("close", "last")] - resultDf[("open", "first")])/resultDf[("open", "first")]
resultDf.index.rename(["year", "month"], inplace=True)
resultDf.reset_index(inplace=True)
resultDf

The code above will result in a dataframe that has multiindex column. So, if you want to get, for example, rows with year of 2010, you can do something like:

resultDf[resultDf["year"] == 2010]

CodePudding user response：

You can create a custom grouper such as follow :

import pandas as pd
import numpy as np
from io import StringIO

csvfile = StringIO(
"""datetime\topen\thigh\tlow\tclose
2006-01-02\t4566.95\t4601.35\t4542.00\t4556.25
2006-01-03\t4531.45\t4605.45\t4531.45\t4600.25  
2006-01-04\t4619.55\t4707.60\t4616.05\t4694.14""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

df.datetime = pd.to_datetime(df.datetime, format = "%Y-%m-%d")

dg = df.groupby(pd.Grouper(key='datetime', axis=0, freq='M'))

Then each group of dg is separate by month, and since we convert datetime as pandas.datetime we can use classic arithmetic on it :

def monthly_return(datetime, close_value, open_value):
    index_start = np.argmin(datetime)
    index_end = np.argmax(datetime)
    return (close_value[index_end] - open_value[index_start]) / open_value[index_start]

dg.apply(lambda x : monthly_return(x.datetime, x.close, x.open))
Out[97]: 
datetime
2006-01-31    0.02785
Freq: M, dtype: float64

Of course a pure functional approach is possible instead of using monthly_return function