Home > Mobile >  Pandas timeseries with dateindex, how do I find hours between two where a value is < than a numbe
Pandas timeseries with dateindex, how do I find hours between two where a value is < than a numbe

Time:12-06

I have a dataframe with datetime index that I am analysing. I have a column with prices.

Example

2022-01-01 00:00:00  |  32.21
2022-01-01 01:00:00  |  10.20
2022-01-01 02:00:00  |  42.12
2022-01-01 03:00:00  |  01.05

I am looking to make another column that lists how many hours it has been since the price was under a certain, constant value. With the above example, with values under 30, it would like:

2022-01-01 00:00:00  |  32.21 | 4
2022-01-01 01:00:00  |  10.20 | 0
2022-01-01 02:00:00  |  42.12 | 1
2022-01-01 03:00:00  |  01.05 | 0

How can I do this? I thought about putting the index and price into a list of lists or tuple, calculate, then put it back, but I assume there is a better way in Pandas?

Thanks,

Gregersdk

CodePudding user response:

Not sure if I understood what you want correctly, but it might be what ure looking for:

df["constant_since"] = df["date"].apply( lambda x: datetime.datetime.now() - datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S"))

it will add the new row based on how long the price has been there compared to now, ofc you can change the 'datetime.datetime.now()` to another arbitrary value if you want.

CodePudding user response:

With the following in data.csv:

2022-01-01 00:00:00,32.21
2022-01-01 01:00:00,10.20
2022-01-01 02:00:00,42.12
2022-01-01 03:00:00,01.05

Assuming you have one entry every hour, you could try this:

import pandas as pd

df = pd.read_csv("data.csv", header=None, index_col=0, names=["value"])
df["above_30"] = df.value > 30

res = []
for i, above in enumerate(df.above_30):
    if i == 0:
        res.append(4) # for first row
    elif above:
        res.append(res[-1]   1)
    else:
        res.append(0)

df["result"] = res

The result should be:

>>> df
                     value  above_30  result
2022-01-01 00:00:00  32.21      True       4
2022-01-01 01:00:00  10.20     False       0
2022-01-01 02:00:00  42.12      True       1
2022-01-01 03:00:00   1.05     False       0

CodePudding user response:

If I'm understanding correctly, here is an example of how you could use a for loop for this task.

import pandas as pd

# example dataframe
df = pd.DataFrame({
    'date':['1/1/2022 00:00:00','1/1/2022 01:00:00','1/1/2022 02:00:00','1/1/2022 03:00:00'],
    'value' : [30,10,40,10]
})
df.date = pd.to_datetime(df.date)
df.set_index('date',inplace=True)


# empty list to be populated
l = []
# counter variable
count = 0

for i in range(df.shape[0]):
    
    # increase the counter at each iteration
    count = count   1
    
    # reset the counter if the condition is met
    if df.value[i] >= 30:
        count = 0
    
    # append the counter at each iteration to the list "l"
    l.append(count)

# add a new column "count" using the list "l"
df['count'] = l

# output
df.head()

Output:

                  value count
date        
2022-01-01 00:00:00 30  0
2022-01-01 01:00:00 10  1
2022-01-01 02:00:00 40  0
2022-01-01 03:00:00 10  1
  • Related