I have a column which has data as :
Date |
---|
'2021-01-01' |
'2021-01-10' |
'2021-01-09' |
'2021-01-11' |
I need to get only the "year and month" as one column and have it as an integer instead of string like '2021-01-01' should be saved as 202101. (I don't need the day part).
When I try to clean the data I am able to do it but it removes the leading zeroes.
df['period'] = df['Date'].str[:4] df['Date'].str[6:7]
This gives me:
Date |
---|
20211 |
202110 |
20219 |
202111 |
As you can see, for months Jan to Sept, it returns only 1 to 9 instead of 01 to 09, which creates discrepancy. If I add a zero manually as part of the merge it will make '2021-10' as 2021010. I want it simply as the Year and month without the hyphen and keeping the leading zeroes for months. See below how I would want it to come in the new column.
Date |
---|
202101 |
202110 |
202109 |
202111 |
I can do it using loop but that's not efficient. Is there a better way to do it in python?
CodePudding user response:
The leading zeros are being dropped because of a misunderstanding about the use of slice notation in Python.
Try changing your code to:
df['period'] = df['Date'].str[:4] df['Date'].str[5:7]
Note the change from [6:7] to [5:7].
CodePudding user response:
strip the inverted comma, coerce the date to datetime in your desired format and convert it to integer. Code below
df['Date_edited']=pd.to_datetime(df['Date'].str.strip("''")).dt.strftime('%Y%m').astype(int)
Date Date_edited
0 '2021-01-01' 202101
1 '2021-01-10' 202101
2 '2021-01-09' 202101
3 '2021-01-11' 202101