I have a dataframe consisting of some values going up and down on which a number of cycles are based, x
, and a column indicating a cycle, cycle
.
A cycle is started at the lowest point of x
, from where x
increases to the highest point after which x
decreases to the lowest point again and a new cycle starts. So this indicates two cycles: lowest x
--> highest x
--> lowest x
--> highest x
--> lowest x
... repeat.
The highest point of a cycle x
is indicated with a 1 in the cycle
column, the lowest point of x
is indicated with a 0 in the cycle
column. See some example data below, note, the highest and lowest points of the cycle are at different points of x
each cycle.
import pandas as pd
import numpy as np
x = [1,2,3,4,5,4,3, # cycle 1
2,3,4,5,4,3,2, # cycle 2
1,2,3,4,5,6,5, # cycle 3
4] # cycle 4
points = [0, np.nan, np.nan, np.nan, 1, np.nan, np.nan, # cycle 1
0, np.nan, np.nan, 1, np.nan, np.nan, np.nan, # cycle 2
0, np.nan, np.nan, np.nan, np.nan, 1, np.nan, # cycle 3
0] # cycle 4
df = pd.DataFrame({'x':x, 'cycle':points})
I need to make two new columns indicating the highs and lows of the cycles each row is a part of, this cycle would be resetted everytime a new cycle is started with a 0 value in the cycle
column.
The desired end result would be a df looking like this:
low = [1,1,1,1,1,1,1, # cycle 1
2,2,2,2,2,2,2, # cycle 2
1,1,1,1,1,1,1, # cycle 3
4] # cycle 4
high = [5,5,5,5,5,5,5, # cycle 1
5,5,5,5,5,5,5, # cycle 2
6,6,6,6,6,6,6, # cycle 3
4] # cycle 4
new_df = pd.DataFrame({'x':x, 'cycle':points, 'low':low, 'high':high})
Note, the last start of a cycle with index 21 would have high value equal to the low value as this cycle consists of only one data point.
Does anyone have any idea on how to generate these columns automatically? (my actual data set has the same structure but has much more rows and more of these cycles)
CodePudding user response:
Here is one way to do it using Pandas fillna and interpolate methods:
# Add 'low' column
df.loc[df["cycle"] == 0, "low"] = df.loc[df["cycle"] == 0, "x"]
df["low"] = df["low"].fillna(method="ffill").astype(int)
# Add 'high' column
df["high"] = df["cycle"].replace(0, np.nan)
df.loc[df["high"] == 1, "high"] = df.loc[df["high"] == 1, "x"]
df.loc[~df["cycle"].isna(), "high"] = df.loc[~df["cycle"].isna(), "high"].fillna(
method="bfill"
)
df["high"] = df["high"].interpolate("pad").astype(int)
# Deal with last row
if df.loc[df.shape[0] - 1, "cycle"] == 0:
df.loc[df.shape[0] - 1, "high"] = df.loc[df.shape[0] - 1, "low"]
Then:
print(df)
# Output
x cycle low high
0 1 0.0 1 5
1 2 NaN 1 5
2 3 NaN 1 5
3 4 NaN 1 5
4 5 1.0 1 5
5 4 NaN 1 5
6 3 NaN 1 5
7 2 0.0 2 5
8 3 NaN 2 5
9 4 NaN 2 5
10 5 1.0 2 5
11 4 NaN 2 5
12 3 NaN 2 5
13 2 NaN 2 5
14 1 0.0 1 6
15 2 NaN 1 6
16 3 NaN 1 6
17 4 NaN 1 6
18 5 NaN 1 6
19 6 1.0 1 6
20 5 NaN 1 6
21 4 0.0 4 4