Home > Net >  Check periodicity of value in a Pandas Data Frame
Check periodicity of value in a Pandas Data Frame

Time:06-17

I have a Pandas dataframe with three columns like that :

Time Code Id
10:10:00 Rx 11
10:10:01 Tx 11
10:10:02 Rx 12
10:10:04 Tx 12
10:10:06 Rx 13
10:10:07 Tx 13
10:10:08 Rx 11
10:10:10 Rx 11

I want to check if for a Rx code if there is a Tx code just after and if the id is same for the Rx and Tx. I want to get the row of duplicate Rx if there is.

In my example I want to throw the 10:10:10 Rx because it's duplicated.

I managed to do with for loop but I should'nt use for loop with Data Frame

    old_cell = None
    for index, row in pdo_df.iterrows():
        if old_cell is None:
            old_cell = row
        if row['Function_code'] == old_cell['Function_code']:
            print("----------------")
            print("Error :")
            print(old_cell)
            print(row)
            print("----------------")
        old_cell = row

CodePudding user response:

The method shift help you look at the value of the last row. This code detect then all the duplicates :

df[
    (df["Code"] == df["Code"].shift()) &
    (df["Id"] == df["Id"].shift()) 
    ]

Following the same logic, if we take the opposite of the last code, you have your dataframe without those duplicates :

df[
    ~((df["Code"] == df["Code"].shift()) &
    (df["Id"] == df["Id"].shift()) )
    ]

CodePudding user response:

If this is based on the Code column we can do this in two steps, first create a cumultive count over your Code column then a simple sum to measure equality.

s = df.groupby('Code').cumcount()

s1 = (s.groupby(s.index // 2).transform('sum') % 2 > 0)

df1 = pd.concat([df[~s1], df[s1].drop_duplicates(subset='Code',keep='first')])


print(df1)

       Time Code  Id
0  10:10:00   Rx  11
1  10:10:01   Tx  11
2  10:10:02   Rx  12
3  10:10:04   Tx  12
4  10:10:06   Rx  13
5  10:10:07   Tx  13
6  10:10:08   Rx  11

digging into the steps:

print(s)

0    0
1    0
2    1
3    1
4    2
5    2
6    3
7    4 

print(s1)

0    False
1    False
2    False
3    False
4    False
5    False
6     True # <-- non-matching duplicate one
7     True # <-- non-matching duplicate two 
dtype: bool
  • Related