Here it is my dataframe
import pandas as pd
data = [[1, 1, 1, 0],
[1, 2, 1, 1],
[1, 2, 2, 2],
[2, 1, 1, 3]]
df = pd.DataFrame(data, columns=["idx 1", "idx 2", "idx 3", "value"])
>>> df
idx 1 idx 2 idx 3 value
0 1 1 1 0
1 1 2 1 1
2 1 2 2 2
3 2 1 1 3
I have created a "sandwich" function, which copies the first and last row and inserts them in the first and last position:
def sandwich(df):
return pd.concat([df[:1], df, df[-1:]])
>>> df.apply(sandwich)
idx 1 idx 2 idx 3 value
0 1 1 1 0 # new
0 1 1 1 0
1 1 2 1 1
2 1 2 2 2
3 2 1 1 3
3 2 1 1 3 # new
The trouble arises when I want to group the dataframe first and then apply the sandwich function to each group.
>>> df.groupby(["idx 1", "idx 2"]).apply(sandwich)
idx 1 idx 2 idx 3 value
idx 1 idx 2
1 1 0 1 1 1 0
0 1 1 1 0
0 1 1 1 0
2 1 1 2 1 1
1 1 2 1 1
2 1 2 2 2
2 1 2 2 2
2 1 3 2 1 1 3
3 2 1 1 3
3 2 1 1 3
The result is correct, however I get it practically twice because pandas adds an index and doesn't remove the "idx 1" and "idx 2" columns. I can drop the index afterwards to get the desired result:
>>> df.groupby(["idx 1", "idx 2"]).apply(sandwich).reset_index(drop=True)
idx 1 idx 2 idx 3 value
0 1 1 1 0
1 1 1 1 0
2 1 1 1 0
3 1 2 1 1
4 1 2 1 1
5 1 2 2 2
6 1 2 2 2
7 2 1 1 3
8 2 1 1 3
9 2 1 1 3
However, creating an index to destroy it immediately after doesn't seem the best approach. Is there a better way?
CodePudding user response:
Use group_key=False
in groupby
:
df.groupby(["idx 1", "idx 2"], group_keys=False).apply(sandwich)
you will maintain the original index generated by sandwich
:
idx 1 idx 2 idx 3 value
0 1 1 1 0
0 1 1 1 0
0 1 1 1 0
1 1 2 1 1
1 1 2 1 1
2 1 2 2 2
2 1 2 2 2
3 2 1 1 3
3 2 1 1 3
3 2 1 1 3