I have a pandas dataframe df
:
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 0 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 0 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 0 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 0 | 8 |
Mercedes Class A | 1 | 9 |
I want to enlarge the active part of the binary series Open
by n
units, but after grouping the dataframe by Car
.
An active part is a group of consecutive 1 that is either surrounded by 0, or having only 0 as previous value, or having only 0 as next values. The case when the series has only 1 as value is ignored.
If n = 1
, I want to get the following dataframe :
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 1 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 1 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 1 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 1 | 8 |
Mercedes Class A | 1 | 9 |
I can get the index of all active parts using the following code :
df = pd.DataFrame(
{
"Car": ["Audi A5"]*10 ["Mercedes Class A"]*10,
"Time" : list(range(10)) list(range(10)),
"Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
}
)
def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:
get_group_indexes = (
lambda x: x.index[0]
if x.index[-1] - x.index[0] >= 1
else None
)
groups = (
dataframe[sensor]
.eq(0)
.cumsum()[dataframe[sensor].ne(0)]
.to_frame()
.groupby(sensor)
.apply(get_group_indexes)
.dropna()
)
if groups.empty:
return dataframe
for index in groups:
dataframe.loc[index-n:index, sensor] = 1
return dataframe
It works when I don't have to group by Car
but I want to group by this column before perfoming this transformation. Does someone hqs an idea how to achieve this efficiently using pandas tricks ? Thanks.
CodePudding user response:
IIUC, you can bfill
per group with a limit after masking the non-1 values:
n=1
df['Open2'] = (df['Open']
.where(df['Open'].eq(1))
.groupby(df['Car']).bfill(limit=n)
.fillna(df['Open'], downcast='infer')
)
output (as new column "Open2" for clarity):
Car Time Open Open2
0 Audi A5 0 0 0
1 Audi A5 1 0 0
2 Audi A5 2 0 1
3 Audi A5 3 1 1
4 Audi A5 4 1 1
5 Audi A5 5 0 0
6 Audi A5 6 0 0
7 Audi A5 7 0 1
8 Audi A5 8 1 1
9 Audi A5 9 1 1
10 Mercedes Class A 0 1 1
11 Mercedes Class A 1 1 1
12 Mercedes Class A 2 1 1
13 Mercedes Class A 3 0 0
14 Mercedes Class A 4 0 1
15 Mercedes Class A 5 1 1
16 Mercedes Class A 6 1 1
17 Mercedes Class A 7 0 0
18 Mercedes Class A 8 0 1
19 Mercedes Class A 9 1 1