Get the difference between two dates when string value changes-CodePudding

I want to get the number of days between the change of string values (ie., the symbol column) in one column, grouped by their respective id. I want a separate column for datediff like the one below.

id  date         symbol  datediff
1   2022-08-26   a       0
1   2022-08-27   a       0
1   2022-08-28   a       0
2   2022-08-26   a       0
2   2022-08-27   a       0
2   2022-08-28   a       0
2   2022-08-29   b       3
3   2022-08-29   c       0
3   2022-08-30   b       1

For id = 1, datediff = 0 since symbol stayed as a. For id = 2, datediff = 3 since symbol changed after 3 days from a to b. Hence, what I'm looking for is a code that computes the difference in which the id changes it's symbol.

I am currently using this code:

df['date'] = pd.to_datetime(df['date'])

diff = ['0 days 00:00:00']

for st,i in zip(df['symbol'],df.index):
    if i > 0:#cannot evaluate previous from index 0
        if df['symbol'][i] != df['symbol'][i-1]:
            diff.append(df['date'][i] - df['data_ts'][i-1])
        else:   
            diff.append('0 days 00:00:00')

The output becomes:

id  date         symbol  datediff
1   2022-08-26   a       0
1   2022-08-27   a       0
1   2022-08-28   a       0
2   2022-08-26   a       0
2   2022-08-27   a       0
2   2022-08-28   a       0
2   2022-08-29   b       1
3   2022-08-29   c       0
3   2022-08-30   b       1

It also computes the difference between two different ids. But I want the computation to be separate from different ids.

I only see questions about difference of dates when values changes, but not when string changes. Thank you!

CodePudding user response：

IIUC: my solution works with the assumption that the symbols within one id ends with a single changing symbol, if there is any (as in the example given in the question).

First use df.groupby on id and symbol and get the minimum date for each combination. Then, find the difference between the dates within each id. This gives the datediff. Finally, merge the findings with the original dataframe.

df1 = df.groupby(['id', 'symbol'], sort=False).agg({'date': np.min}).reset_index()

df1['datediff'] = abs(df1.groupby('id')['date'].diff().dt.days.fillna(0))
df1 = df1.drop(columns='date')

df_merge = pd.merge(df, df1, on=['id', 'symbol'])