Home > Software engineering >  Pandas cutting off values of a column
Pandas cutting off values of a column

Time:07-25

I would like to remove the ".0" from all values in a column in a dataframe.

attached is my structure

data = {'Date':['2020-02-05 12:54:49.0', '2021-02-06 16:27:09.0', '2022-02-07 16:57:15.0'],'User':[28,34,29]}
df = pd.DataFrame(data)

print(df)

CodePudding user response:

.0 is the microsecond part, it is represented by %f. You can convert the column to datetime type and convert the datetime type back to string format without %f

df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)

                  Date  User
0  2020-02-05 12:54:49    28
1  2021-02-06 16:27:09    34
2  2022-02-07 16:57:15    29

If you just want to remove .0 and leave .123 etc. You can use .str.rstrip

df['Date'] = df['Date'].str.rstrip('.0')

CodePudding user response:

Use:

df["Date"] = df["Date"].str.replace("\.0$", "", regex=True)
print(df)

Output

                  Date  User
0  2020-02-05 12:54:49    28
1  2021-02-06 16:27:09    34
2  2022-02-07 16:57:15    29

The pattern ".0$" will only match a dot (.) followed by 0 at the end of the string.

CodePudding user response:

Just jettison 2 last characters, that is

import pandas as pd
data = {'Date':['2020-02-05 12:54:49.0', '2021-02-06 16:27:09.0', '2022-02-07 16:57:15.0'],'User':[28,34,29]}
df = pd.DataFrame(data)
df['Date'] = df.Date.str[:-2]
print(df)

output

                  Date  User
0  2020-02-05 12:54:49    28
1  2021-02-06 16:27:09    34
2  2022-02-07 16:57:15    29

CodePudding user response:

I believe the best way to do the job involves thinking about what you want to do with the data after you remove the trailing milliseconds from the 'Date' column.

Do you need to do a lot of date manipulation later?

If that's your case, you can simply use pandas.to_datetime like this.

Since all the dates end with ".0", when you convert the date to a string, the milliseconds part will be removed:

>>> df.Date = pd.to_datetime(df.Date)
>>> print(df.Date)
0   2020-02-05 12:54:49
1   2021-02-06 16:27:09
2   2022-02-07 16:57:15
>>> print(df.Date.dtype)
datetime64[ns]

When it is a good idea to do this? when you have to do a lot of date manipulations.

If the date in your dataframe is a string, when you do something like df.Date > "2021-01-01", it will trigger a conversion from string to date for every row (and it won't be cached).

I just need to store the string to display somewhere else

Case 1: IF you are sure all dates end with '.0', you can just strip the last 2 characters from the string:

df.Date = df.Date.apply(lambda x: x[:-2])

This is very quick. But if the milliseconds part has more than one character it will fail ('2021-01-01 01:02:03.0456' -> '2021-01-01 01:02:03.04'

Case 2: Split the string after the dot (there are no dots in the middle of dates).

Just like this:

df.Date =  df.Date.apply(lambda x: x.split('.')[0])
  • Related