I would like to remove the ".0" from all values in a column in a dataframe.
attached is my structure
data = {'Date':['2020-02-05 12:54:49.0', '2021-02-06 16:27:09.0', '2022-02-07 16:57:15.0'],'User':[28,34,29]}
df = pd.DataFrame(data)
print(df)
CodePudding user response:
.0
is the microsecond part, it is represented by %f
. You can convert the column to datetime type and convert the datetime type back to string format without %f
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)
Date User
0 2020-02-05 12:54:49 28
1 2021-02-06 16:27:09 34
2 2022-02-07 16:57:15 29
If you just want to remove .0
and leave .123
etc. You can use .str.rstrip
df['Date'] = df['Date'].str.rstrip('.0')
CodePudding user response:
Use:
df["Date"] = df["Date"].str.replace("\.0$", "", regex=True)
print(df)
Output
Date User
0 2020-02-05 12:54:49 28
1 2021-02-06 16:27:09 34
2 2022-02-07 16:57:15 29
The pattern ".0$" will only match a dot (.
) followed by 0
at the end of the string.
CodePudding user response:
Just jettison 2 last characters, that is
import pandas as pd
data = {'Date':['2020-02-05 12:54:49.0', '2021-02-06 16:27:09.0', '2022-02-07 16:57:15.0'],'User':[28,34,29]}
df = pd.DataFrame(data)
df['Date'] = df.Date.str[:-2]
print(df)
output
Date User
0 2020-02-05 12:54:49 28
1 2021-02-06 16:27:09 34
2 2022-02-07 16:57:15 29
CodePudding user response:
I believe the best way to do the job involves thinking about what you want to do with the data after you remove the trailing milliseconds from the 'Date' column.
Do you need to do a lot of date manipulation later?
If that's your case, you can simply use pandas.to_datetime
like this.
Since all the dates end with ".0", when you convert the date to a string, the milliseconds part will be removed:
>>> df.Date = pd.to_datetime(df.Date)
>>> print(df.Date)
0 2020-02-05 12:54:49
1 2021-02-06 16:27:09
2 2022-02-07 16:57:15
>>> print(df.Date.dtype)
datetime64[ns]
When it is a good idea to do this? when you have to do a lot of date manipulations.
If the date in your dataframe is a string, when you do something like df.Date > "2021-01-01"
, it will trigger a conversion from string to date for every row (and it won't be cached).
I just need to store the string to display somewhere else
Case 1: IF you are sure all dates end with '.0', you can just strip the last 2 characters from the string:
df.Date = df.Date.apply(lambda x: x[:-2])
This is very quick. But if the milliseconds part has more than one character it will fail ('2021-01-01 01:02:03.0456' -> '2021-01-01 01:02:03.04'
Case 2: Split the string after the dot (there are no dots in the middle of dates).
Just like this:
df.Date = df.Date.apply(lambda x: x.split('.')[0])