Home > front end >  How to get first 4 digits from an integer column in python
How to get first 4 digits from an integer column in python

Time:09-17

I am trying to get and month from the column which is integer format

'''dvd['yy'] = str(dvd['CalendarYearMonth'])[:3]
dvd['mon'] = str(dvd['CalendarYearMonth'])[4:6]'''

but getting following output

   CalendarYearMonth CountryCode  Dividends   yy mon
0             202108          CN      196.0  0     2
1             202109          CN      380.0  0     2
2             202108          IN        NaN  0     2
3             202109          IN      115.0  0     2

could anyone help me to get the correct output- dvd is the input DF

CodePudding user response:

Try this instead:

dvd['yy'] = dvd['CalendarYearMonth'].astype(str).str[:3]
dvd['mon'] = dvd['CalendarYearMonth'].astype(str).str[4:6]

CodePudding user response:

try this:

dvd = pd.DataFrame( {
    'CalendarYearMonth': [202108.0, 202109.0, 202108.0, 202109.0],

})
dvd['yy'] = dvd['CalendarYearMonth'].apply(lambda x : str(x)[:4])
dvd['mon'] = dvd['CalendarYearMonth'].apply(lambda x : str(x)[4:6])
print(DVD)

Output:

CalendarYearMonth   yy      mon
0   202108.0        2021    08
1   202109.0        2021    09
2   202108.0        2021    08
3   202109.0        2021    09

CodePudding user response:

If the date is already an int, take advantage of it

df['yy'] = df['CalendarYearMonth']//100
df['mon'] = df['CalendarYearMonth']-df['yy']*100

output:

   CalendarYearMonth CountryCode  Dividends    yy  mon
0             202108          CN      196.0  2021    8
1             202109          CN      380.0  2021    9
2             202108          IN        NaN  2021    8
3             202109          IN      115.0  2021    9

CodePudding user response:

To bring up one more option, you can convert the column with dates to datetime objects and then extract the year and month information:

import pandas as pd

dvd = pd.DataFrame({
    'CalendarYearMonth': [201908, 202001, 202103, 202107],
})

dates = pd.to_datetime(dvd['CalendarYearMonth'], format='%Y%m')
dvd['yy'] = dates.dt.year
dvd['mon'] = dates.dt.month

It gives:

   CalendarYearMonth    yy  mon
0             201908  2019    8
1             202001  2020    1
2             202103  2021    3
3             202107  2021    7
  • Related