when I extracted the date column from dataframe it looks like this:
10/30/2016
10/30/2016
10/30/2016
10/30/2016
9/4/2017 1
9/4/2017 1
9/4/2017 1
9/4/2017 1
9/4/2017 1
I need to set the dates with the same digits format to get rid of the extra 1 from the right. the format should be :
09/04/2017
Any help.
CodePudding user response:
We can do the following with a regex capture
import re
import datetime
bad_date = "9/4/2017 1"
new_date = re.sub("(\d{,2}/\d{,2}/\d{,4})(\s.*\d )","\\1", bad_date)
Finally with datetime
print(datetime.datetime.strptime(new_date, "%d/%m/%Y"))
2017-04-09 00:00:00
CodePudding user response:
If your dataframe df
looks like
date
0 10/30/2016
1 10/30/2016
2 10/30/2016
3 10/30/2016
4 9/4/2017 1
5 9/4/2017 1
6 9/4/2017 1
7 9/4/2017 1
8 9/4/2017 1
then you could do
import pandas as pd
df["date"] = (
pd.to_datetime(df["date"].str.split(expand=True)[0])
.dt.strftime("%m/%d/%Y")
)
to get
date
0 10/30/2016
1 10/30/2016
2 10/30/2016
3 10/30/2016
4 09/04/2017
5 09/04/2017
6 09/04/2017
7 09/04/2017
8 09/04/2017
- First
.str.split()
the columndate
(at whitespace) and only use the first part (column 0). - Then use
pd.to_datetime()
to convert the series to datetime. - Finally convert it back to strings in the required format (
"%m/%d/%Y"
).
CodePudding user response:
Given:
date
0 10/30/2016
1 10/30/2016
2 10/30/2016
3 10/30/2016
4 9/4/2017 1
5 9/4/2017 1
6 9/4/2017 1
7 9/4/2017 1
8 9/4/2017 1
Doing:
- Simplifying your problem to a string method,
rstrip(' 1')
df.date = pd.to_datetime(df.date.str.rstrip(' 1')).dt.strftime("%m/%d/%Y")
print(df)
Output:
date
0 10/30/2016
1 10/30/2016
2 10/30/2016
3 10/30/2016
4 09/04/2017
5 09/04/2017
6 09/04/2017
7 09/04/2017
8 09/04/2017
Or, without ever acknowledging it's a date:
df.date = (df.date.str.rstrip(' 1')
.str.split('/')
.map(lambda row: '/'.join(x.zfill(2) for x in row)))
# Same output as above~