Home > Blockchain >  How to keep leading zeroes from a panda column post operation?
How to keep leading zeroes from a panda column post operation?

Time:11-23

I have a column which has data as :

Date
'2021-01-01'
'2021-01-10'
'2021-01-09'
'2021-01-11'

I need to get only the "year and month" as one column and have it as an integer instead of string like '2021-01-01' should be saved as 202101. (I don't need the day part).

When I try to clean the data I am able to do it but it removes the leading zeroes.

df['period'] = df['Date'].str[:4]   df['Date'].str[6:7]

This gives me:

Date
20211
202110
20219
202111

As you can see, for months Jan to Sept, it returns only 1 to 9 instead of 01 to 09, which creates discrepancy. If I add a zero manually as part of the merge it will make '2021-10' as 2021010. I want it simply as the Year and month without the hyphen and keeping the leading zeroes for months. See below how I would want it to come in the new column.

Date
202101
202110
202109
202111

I can do it using loop but that's not efficient. Is there a better way to do it in python?

CodePudding user response:

The leading zeros are being dropped because of a misunderstanding about the use of slice notation in Python.

Try changing your code to:

df['period'] = df['Date'].str[:4]   df['Date'].str[5:7]

Note the change from [6:7] to [5:7].

CodePudding user response:

strip the inverted comma, coerce the date to datetime in your desired format and convert it to integer. Code below

df['Date_edited']=pd.to_datetime(df['Date'].str.strip("''")).dt.strftime('%Y%m').astype(int)



      Date         Date_edited
0  '2021-01-01'       202101
1  '2021-01-10'       202101
2  '2021-01-09'       202101
3  '2021-01-11'       202101
  • Related