I have a dataframe that looks like the following
id value
11 0
11 3
11 1
11 2
4 3
4 1
4 8
4 2
5 0
5 3
5 1
5 2
4 2
4 0
4 1
4 3
12 2
12 1
12 0
12 3
I'm hoping to get the mean of groups with ids in a range... i.e. mean of the first value of the first 0->5 group, and the first value of the 2nd 0->5 group, etc.
id value
0-5 1.66
0-5 1.33
0-5 3.33
0-5 2.33
....
e.g. for 0 < id <= 5:
[3 0 2, 1 3 0, 8 1 1, 2 2 3]/3 = [1.66, 1.33, 3.33, 2.33]
CodePudding user response:
You could cut
the ids into specific bins before groupby
:
df["category"] = pd.cut(df["id"], [0,5,10,15], labels=["0-5","5-10","10-15"]).astype(str)
df["subgroup"] = df.groupby(df["id"].ne(df["id"].shift()).cumsum()).cumcount()
output = df.groupby(["category", "subgroup"],as_index=False)["value"].mean()
>>> output
category subgroup value
0 0-5 0 1.666667
1 0-5 1 1.333333
2 0-5 2 3.333333
3 0-5 3 2.333333
4 10-15 0 1.000000
5 10-15 1 2.000000
6 10-15 2 0.500000
7 10-15 3 2.500000
CodePudding user response:
N = 5
df2 = pd.DataFrame(df.groupby("id").agg({'value': list}).apply(lambda x: [x.name//(N 1), x.value], axis=1).values.tolist()).groupby(0).agg({1:list})
df2["mini"] = df2.apply(lambda x: min([len(y) for y in x[1]]), axis=1)
df2 = df2.apply(lambda x: np.array([item for sublist in x[1] for item in sublist]).reshape(-1,x["mini"]).T.mean(axis=1), axis=1).explode()
df2 = pd.DataFrame(df2.values, index=df2.index)
df2.index = df2.apply(lambda x: str(x.name * N) "-" str((x.name 1) * N), axis=1)
df2.rename(columns={0: 'value'}, inplace=True)
df2
output:
value
0-5 1.666667
0-5 1.333333
0-5 3.333333
0-5 2.333333
5-10 0.0
5-10 3.0
5-10 1.0
5-10 2.0
10-15 2.0
10-15 1.0
10-15 0.0
10-15 3.0