Home > Net >  delete consecutive elements in a pandas dataFrame given a certain rule?
delete consecutive elements in a pandas dataFrame given a certain rule?

Time:07-28

I have a variable with zeros and ones. Each sequence of ones represent "a phase" I want to observe, each sequence of zeros represent the space/distance that intercurr between these phases.

It may happen that a phase carries a sort of "impulse response", for example it can be the echo of a voice: in this case we will have 1,1,1,1,0,0,1,1,1,0,0,0 as an output, the first sequence ones is the shout we made, while the second one is just the echo cause by the shout.

  • So I made a function that doesn't take into account the echos/response of the main shout/action, and convert the ones sequence of the echo/response into zeros.
  • (1) If the sequence of zeros is greater or equal than the input threshold nearby_thr the function will recognize that the sequence of ones is an independent phase and it won't delete or change anything.
  • (2) If the sequence of zeros (between two sequences of ones) is smaller than the input threshold nearby_thr the function will recognize that we have "an impulse response/echo" and we do not take that into account. Infact it will convert the ones into zeros.

I made a naive function that can accomplish this result but I was wondering if pandas already has a function like that, or if it can be accomplished in few lines, without writing a "C-like" function.

Here's my code:

import pandas as pd
import matplotlib.pyplot as plt
# import utili_funzioni.util00 as ut0

x1 = pd.DataFrame([0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1])
x2 = pd.DataFrame([0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,0])


# rule = x1==1        ## counting number of consecutive ones
# cumsum_ones = rule.cumsum() - rule.cumsum().where(~rule).ffill().fillna(0).astype(int)




def detect_nearby_el_2(df, nearby_thr):
    global el2del
    # df = consecut_zeros
    # i = 0
    print("")
    print("")
    j = 0
    enterOnce_if = 1
    reset_count_0s = 0
    start2detect = False
    count0s = 0  # init
    start2_getidxs = False  # if this is not true, it won't store idxs to delete
    el2del = []  # store idxs to delete elements

    for i in range(df.shape[0]):
        print("")
        print("i: ", i)
        x_i = df.iloc[i, 0]

        if x_i == 1 and j==0:  # first phase (ones) has been detected
            start2detect = True  # first phase (ones) has been detected
            # j  = 1

        print("count0s:",count0s)
        if start2detect == True:        # first phase, seen/detected, --> (wait) has ended..
            if x_i == 0:  # 1st phase detected and ended with "a zero"

                if reset_count_0s == 1:
                    count0s = 0
                    reset_count_0s = 0

                count0s  = 1

                if enterOnce_if == 1:
                    start2_getidxs=True    # avoiding to delete first phase
                    enterOnce_0 = 0


        if start2_getidxs==True:   # avoiding to delete first phase
            if x_i == 1 and count0s < nearby_thr:
                print("this is NOT a new phase!")
                el2del = [*el2del, i]   # idxs to delete
                reset_count_0s = 1      # reset counter

            if x_i == 1 and count0s >= nearby_thr:
                print("this is a new phase!")   # nothing to delete
                reset_count_0s = 1      # reset counter

    return el2del

def convert_nearby_el_into_zeros(df,idx):

    df0 = df   0    # error original dataframe is modified!
    if len(idx) > 0:
        # df.drop(df.index[idx]) # to delete completely
        df0.iloc[idx] = 0
    else:
        print("no elements nearby to delete!!")

    return df0

######
print("")
x1_2del = detect_nearby_el_2(df=x1,nearby_thr=3)
x2_2del = detect_nearby_el_2(df=x2,nearby_thr=3)

## deleting nearby elements
x1_a = convert_nearby_el_into_zeros(df=x1,idx=x1_2del)
x2_a = convert_nearby_el_into_zeros(df=x2,idx=x2_2del)


## PLOTTING
# ut0.grayplt()

fig1 = plt.figure()
fig1.suptitle("x1",fontsize=20)
ax1 = fig1.add_subplot(1,2,1)
ax2 = fig1.add_subplot(1,2,2,sharey=ax1)
ax1.title.set_text("PRE-detect")
ax2.title.set_text("POST-detect")
line1, = ax1.plot(x1)
line2, = ax2.plot(x1_a)

fig2 = plt.figure()
fig2.suptitle("x2",fontsize=20)
ax1 = fig2.add_subplot(1,2,1)
ax2 = fig2.add_subplot(1,2,2,sharey=ax1)
ax1.title.set_text("PRE-detect")
ax2.title.set_text("POST-detect")
line1, = ax1.plot(x2)
line2, = ax2.plot(x2_a)

You can see that x1 has two "response/echoes" that I want to not take into account, while x2 has none, infact nothing changed in x2

enter image description here enter image description here

  • My question is: How this can be accomplished in few lines using pandas?

Thank You

CodePudding user response:

Interesting problem, and I'm sure there's a more elegant solution out there, but here is my attempt - it's at least fairly performant:

x1 = pd.Series([0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1])
x2 = pd.Series([0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,0])

def remove_echos(series, threshold):
    starting_points = (series==1) & (series.shift()==0)
    echo_starting_points = starting_points & series.shift(threshold)==1
    
    echo_starting_points = series[echo_starting_points].index
    change_points = series[starting_points].index.to_list()   [series.index[-1]]

    for (start, end) in zip(change_points, change_points[1:]):
        if start in echo_starting_points:
            series.loc[start:end] = 0
    return series

x1 = remove_echos(x1, 3)
x2 = remove_echos(x2, 3)

(I changed x1 and x2 to be Series instead of DataFrame, it's easy to adapt this code to work with a df if you need to.)

Explanation: we define the "starting point" of each section as a 1 preceded by a 0. Of those we define an "echo" starting point if the point threshold places before is a 1. (The assumption is that we don't have a phases which is shorter than threshold.) For each echo starting point, we zero from it to the next starting point or the end of the Series.

  • Related