I have a dataframe with datetime index that I am analysing. I have a column with prices.
Example
2022-01-01 00:00:00 | 32.21
2022-01-01 01:00:00 | 10.20
2022-01-01 02:00:00 | 42.12
2022-01-01 03:00:00 | 01.05
I am looking to make another column that lists how many hours it has been since the price was under a certain, constant value. With the above example, with values under 30, it would like:
2022-01-01 00:00:00 | 32.21 | 4
2022-01-01 01:00:00 | 10.20 | 0
2022-01-01 02:00:00 | 42.12 | 1
2022-01-01 03:00:00 | 01.05 | 0
How can I do this? I thought about putting the index and price into a list of lists or tuple, calculate, then put it back, but I assume there is a better way in Pandas?
Thanks,
Gregersdk
CodePudding user response:
Not sure if I understood what you want correctly, but it might be what ure looking for:
df["constant_since"] = df["date"].apply( lambda x: datetime.datetime.now() - datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S"))
it will add the new row based on how long the price has been there compared to now, ofc you can change the 'datetime.datetime.now()` to another arbitrary value if you want.
CodePudding user response:
With the following in data.csv:
2022-01-01 00:00:00,32.21
2022-01-01 01:00:00,10.20
2022-01-01 02:00:00,42.12
2022-01-01 03:00:00,01.05
Assuming you have one entry every hour, you could try this:
import pandas as pd
df = pd.read_csv("data.csv", header=None, index_col=0, names=["value"])
df["above_30"] = df.value > 30
res = []
for i, above in enumerate(df.above_30):
if i == 0:
res.append(4) # for first row
elif above:
res.append(res[-1] 1)
else:
res.append(0)
df["result"] = res
The result should be:
>>> df
value above_30 result
2022-01-01 00:00:00 32.21 True 4
2022-01-01 01:00:00 10.20 False 0
2022-01-01 02:00:00 42.12 True 1
2022-01-01 03:00:00 1.05 False 0
CodePudding user response:
If I'm understanding correctly, here is an example of how you could use a for loop for this task.
import pandas as pd
# example dataframe
df = pd.DataFrame({
'date':['1/1/2022 00:00:00','1/1/2022 01:00:00','1/1/2022 02:00:00','1/1/2022 03:00:00'],
'value' : [30,10,40,10]
})
df.date = pd.to_datetime(df.date)
df.set_index('date',inplace=True)
# empty list to be populated
l = []
# counter variable
count = 0
for i in range(df.shape[0]):
# increase the counter at each iteration
count = count 1
# reset the counter if the condition is met
if df.value[i] >= 30:
count = 0
# append the counter at each iteration to the list "l"
l.append(count)
# add a new column "count" using the list "l"
df['count'] = l
# output
df.head()
Output:
value count
date
2022-01-01 00:00:00 30 0
2022-01-01 01:00:00 10 1
2022-01-01 02:00:00 40 0
2022-01-01 03:00:00 10 1