Using df.apply() to a time column that indicates times at every 2 seconds in pandas-CodePudding

I am new to this data science world and trying to understand some basic pandas examples. I have a pandas data frame that I would like to create a new column and add some conditional values as below: It will include yes at every 2 seconds. Otherwise include no. Here is an example: This is my original data frame.

    id  name    time
0   1   name1   260.123
1   2   name2   261.323
2   3   name3   261.342
3   4   name4   261.567
4   5   name5   262.123
...

The new data frame will be like this:

    id  name    time     time_delta
0   1   name1   260.123  yes
1   2   name2   261.323  no
2   3   name3   261.342  no
3   4   name4   261.567  no
4   5   name5   262.123  yes
5   6   name6   262.345  yes
6   7   name7   264.876  yes
7   8   name8   265.234  no
8   9   name9   266.234  yes
9   10  name10  267.234  no
...

The code that I was using is: df['time_delta'] = df['time'].apply(apply_test) And the actual code of the function:

def apply_test(num):
    prev = num
    if round(num) != prev   2:
        prev = prev
        return "no"
    else:
        prev = num
        return "yes"

Please note that the time column has decimals and no patterns.

The result came as all no since the prev is assigned to the next number at each iteration. This was the way I thought it would be. Not sure if there are any other better ways. I would appreciate any help.

UPDATE:

Please note that the time column has decimals and the decimal values have no value in this case. For instance, time=234.xxx will be considered as 234 seconds. Therefore, the next 2 second point is 236.
The data frame has multiple second value if we round it down. In this case, all of them have to be marked as yes. Please refer to the updates result data frame as an example.

CodePudding user response：

You can use:

import numpy as np

N = 2 # time step

# define bins every N seconds
bins = np.arange(np.floor(df['time'].min()), df['time'].max() N, 2)
# get the index of the first row per group
idx = df.groupby(pd.cut(df['time'], bins))['time'].idxmin()

# assign "yes" to the first else "no"
df['timedelta'] = np.where(df.index.isin(idx), 'yes', 'no')

Output:

   id   name     time time_delta
0   1  name1  260.123        yes
1   2  name2  260.323         no
2   3  name3  261.342         no
3   4  name4  261.567         no
4   5  name5  262.123        yes
5   6  name6  263.345         no
6   7  name7  264.876        yes

CodePudding user response：

You can check when the remaining of the cumulative sum of the diff changes value after divided by 2, that is when it enters a new segment of length 2:

remaining = (df['time'].diff().cumsum() // 2).fillna(0)
df['time_delta'] = np.where((~remaining.duplicated()), 'yes', 'no')