I have a dataframe which looks something like below:
df = pd.DataFrame({'State': ['Texas', 'Texas', 'Florida', 'Florida'],
'a': [4, 5, 1, 3], 'b': [6, 10, 3, 11]})
df['ts'] = datetime.utcnow()
table looks something like this below
State a b ts
0 Texas 4 6 2022-09-06 15:33:31
1 Texas 5 10 2022-09-06 15:33:31
2 Florida 1 3 2022-09-06 15:33:31
3 Florida 3 11 2022-09-06 15:33:31
what I want to achieve, is for each group 'ts' should be unique, so I want to increment it's all other values with 1 second so the output dataframe will look like this:
State a b ts
0 Texas 4 6 2022-09-06 15:33:31
1 Texas 5 10 2022-09-06 15:33:32
2 Florida 1 3 2022-09-06 15:33:31
3 Florida 3 11 2022-09-06 15:33:32
With groupby and transform, able to get the series, but can't get any further:
df['ts'] = df['ts'].groupby(df['State']).transform(lambda x: increment_ms(x))
How can I achieve the above output?
CodePudding user response:
You can use groupby().cumcount()
with pd.to_timedelta
:
df['ts'] = pd.to_timedelta(df.groupby('State').cumcount(), unit='s')
Output:
State a b ts
0 Texas 4 6 2022-09-06 15:40:46.429416
1 Texas 5 10 2022-09-06 15:40:47.429416
2 Florida 1 3 2022-09-06 15:40:46.429416
3 Florida 3 11 2022-09-06 15:40:47.429416