I am relatively new to python and pandas. I have a dataset with a date column that starts with a date and then is followed by many rows of timestamps until the next date value appears. Here is some sample data:
7/12/2022
3:47:57AM
3:47:58AM
3:51:27AM
3:52:06AM
7/13/2022
4:18:20AM
4:19:05AM
4:25:51AM
4:27:50AM
I want to use the date value to replace the following timestamps until the next date appears and then use that date to replace the following rows of timestamp values. I want the output to look like the below:
7/12/2022
7/12/2022
7/12/2022
7/12/2022
7/12/2022
7/13/2022
7/13/2022
7/13/2022
7/13/2022
7/13/2022
I am trying to do this in python pandas. Below is the code for dataframe.
df = pd.DataFrame(["7/12/2022","3:47:57AM"," 3:47:58AM","3:51:27AM","3:52:06AM","7/13/2022","4:18:20AM","4:19:05AM","4:25:51AM","4:27:50AM"], columns=['Date'])
Any help would be much appreciated!
CodePudding user response:
using mask, make the values ending with "M" (so it covers AM and PM), to null (np.nan) and then do a ffill
#makes the value as np.nan (its a default value anyways), where condition is met
#and then forward fill to update the null values with the previous row value
df['Date']=df['Date'].mask(df['Date'].str.endswith('M')==True, np.nan).ffill()
df
OR
df['Date']=df['Date'].mask(df['Date'].str.endswith('M') ).ffill()
df
Date
0 7/12/2022
1 7/12/2022
2 7/12/2022
3 7/12/2022
4 7/12/2022
5 7/13/2022
6 7/13/2022
7 7/13/2022
8 7/13/2022
9 7/13/2022