I have a dataframe that looks like this:
Part | Date |
---|---|
1 | 9/1/2021 |
1 | 9/8/2021 |
1 | 9/15/2021 |
2 | 9/1/2020 |
2 | 9/12/2020 |
2 | 9/14/2020 |
The dataframe is already sorted by part, then by date.
I need to calculate the days between each date in the previous row.
The date diff calculation would have to restart each time a new part
row in encountered.
So the desired output would be:
Part | Date | Diff |
---|---|---|
1 | 9/1/2021 | |
1 | 9/8/2021 | 7 |
1 | 9/15/2021 | 7 |
2 | 9/1/2020 | |
2 | 9/12/2020 | 11 |
2 | 9/14/2020 | 2 |
How would you go about processing this data to achieve the desired output?
Any assistance on this would be greatly appreciated!
Thank you
CodePudding user response:
Use groupby
diff
:
df.groupby('Part').Date.diff()
0 NaT
1 7 days
2 7 days
3 NaT
4 11 days
5 2 days
Name: Date, dtype: timedelta64[ns]
If you do not have Date
as timestamp, you can use df.Date = pd.to_datetime(df.Date)
to convert.