Home > database >  Groupby range of numbers in Pandas and extract start and end values
Groupby range of numbers in Pandas and extract start and end values

Time:07-04

Question

Is it possible to groupby a range of numbers (int) in Pandas as per example below? If not, how would I achieve the desired output?

Data

df = pd.DataFrame(
    {"price": [9, 8, 9, 10, 11, 6, 7, 8, 9, 9, 9, 9, 10, 11, 5]},
    index=pd.date_range("19/3/2020", periods=15, freq="H"),
)

df["higher"] = np.where(df.price > df.price.shift(), 1, 0)
df["higher_count"] = df["higher"] * (
    df["higher"].groupby((df["higher"] != df["higher"].shift()).cumsum()).cumcount()   1
)
df = df.drop("higher", axis=1)

Dataframe with the first group highlighted

Dataframe

The groups can be extracted as follows:

from operator import itemgetter
from itertools import groupby

data = df["higher_count"]
for key, group in groupby(enumerate(data), lambda i: i[0] - i[1]):         
    group = list(map(itemgetter(1), group))
    if len(group) > 1:
        print(f"{key}:{group}")


1:[0, 1, 2, 3]
5:[0, 1, 2, 3]
11:[0, 1, 2]

Desired output

For each group generate the following columns:

  • start date
  • price at start date
  • end date
  • price at end date

so for the group with key 1 the output would be as follows:

enter image description here

CodePudding user response:

IIUC you can use diff and cumsum to group, then check if the group has more than 1 element:

df["group"] = df["higher_count"].diff().ne(1).cumsum()
print (df.loc[df.groupby("group")["higher_count"].transform(len)>1]
         .rename_axis("date")
         .reset_index()
         .groupby("group")[["date", "price"]].agg(["first", "last"]))

                     date                     price
                    first                last first last
group
2     2020-03-19 01:00:00 2020-03-19 04:00:00     8   11
3     2020-03-19 05:00:00 2020-03-19 08:00:00     6    9
6     2020-03-19 11:00:00 2020-03-19 13:00:00     9   11
  • Related