Home > Software design >  Most efficient way to enlarge the active area of a binary series pandas?
Most efficient way to enlarge the active area of a binary series pandas?

Time:08-09

I have a pandas dataframe df:

Car Open Time
Audi A5 0 0
Audi A5 0 1
Audi A5 0 2
Audi A5 1 3
Audi A5 1 4
Audi A5 0 5
Audi A5 0 6
Audi A5 0 7
Audi A5 1 8
Audi A5 1 9
Mercedes Class A 1 0
Mercedes Class A 1 1
Mercedes Class A 1 2
Mercedes Class A 0 3
Mercedes Class A 0 4
Mercedes Class A 1 5
Mercedes Class A 1 6
Mercedes Class A 0 7
Mercedes Class A 0 8
Mercedes Class A 1 9

I want to enlarge the active part of the binary series Open by n units, but after grouping the dataframe by Car.

An active part is a group of consecutive 1 that is either surrounded by 0, or having only 0 as previous value, or having only 0 as next values. The case when the series has only 1 as value is ignored.

If n = 1, I want to get the following dataframe :

Car Open Time
Audi A5 0 0
Audi A5 0 1
Audi A5 1 2
Audi A5 1 3
Audi A5 1 4
Audi A5 0 5
Audi A5 0 6
Audi A5 1 7
Audi A5 1 8
Audi A5 1 9
Mercedes Class A 1 0
Mercedes Class A 1 1
Mercedes Class A 1 2
Mercedes Class A 0 3
Mercedes Class A 1 4
Mercedes Class A 1 5
Mercedes Class A 1 6
Mercedes Class A 0 7
Mercedes Class A 1 8
Mercedes Class A 1 9

I can get the index of all active parts using the following code :

df = pd.DataFrame(
   {
      "Car": ["Audi A5"]*10   ["Mercedes Class A"]*10,
      "Time" : list(range(10))   list(range(10)),
      "Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
   }
)

def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:

    get_group_indexes = (
        lambda x: x.index[0]
        if x.index[-1] - x.index[0] >= 1
        else None
    )

    groups = (
        dataframe[sensor]
        .eq(0)
        .cumsum()[dataframe[sensor].ne(0)]
        .to_frame()
        .groupby(sensor)
        .apply(get_group_indexes)
        .dropna()
    )

    if groups.empty:
        return dataframe

    for index in groups:
        dataframe.loc[index-n:index, sensor] = 1

    return dataframe

It works when I don't have to group by Car but I want to group by this column before perfoming this transformation. Does someone hqs an idea how to achieve this efficiently using pandas tricks ? Thanks.

CodePudding user response:

IIUC, you can bfill per group with a limit after masking the non-1 values:

n=1
df['Open2'] = (df['Open']
               .where(df['Open'].eq(1))
               .groupby(df['Car']).bfill(limit=n)
               .fillna(df['Open'], downcast='infer')
              )

output (as new column "Open2" for clarity):

                 Car  Time  Open  Open2
0            Audi A5     0     0      0
1            Audi A5     1     0      0
2            Audi A5     2     0      1
3            Audi A5     3     1      1
4            Audi A5     4     1      1
5            Audi A5     5     0      0
6            Audi A5     6     0      0
7            Audi A5     7     0      1
8            Audi A5     8     1      1
9            Audi A5     9     1      1
10  Mercedes Class A     0     1      1
11  Mercedes Class A     1     1      1
12  Mercedes Class A     2     1      1
13  Mercedes Class A     3     0      0
14  Mercedes Class A     4     0      1
15  Mercedes Class A     5     1      1
16  Mercedes Class A     6     1      1
17  Mercedes Class A     7     0      0
18  Mercedes Class A     8     0      1
19  Mercedes Class A     9     1      1
  • Related