Home > OS >  Separation of year, month and day in an 8-digit number
Separation of year, month and day in an 8-digit number

Time:11-22

I am new to Python/pandas. I have a data frame that looks like this

x = pd.DataFrame([20210901,20210902, 20210903, 20210904])

[out]:

          0
0  20210901
1  20210902
2  20210903
3  20210904

I want to separate each row as follows: For example

year = 2021
month = 9
day = 1

or I have a list for each row like this:

[2021,9,1]

CodePudding user response:

You can use pd.to_datetime to convert the entire column to datetime type.

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'Col': [20210917,20210918, 20210919, 20210920]})
>>>
>>> df.Col = pd.to_datetime(df.Col, format='%Y%m%d')
>>> df
         Col
0 2021-09-17
1 2021-09-18
2 2021-09-19
3 2021-09-20
>>> df['Year'] = df.Col.dt.year
>>> df['Month'] = df.Col.dt.month
>>> df['Day'] = df.Col.dt.day
>>>
>>> df
         Col  Year  Month  Day
0 2021-09-17  2021      9   17
1 2021-09-18  2021      9   18
2 2021-09-19  2021      9   19
3 2021-09-20  2021      9   20

If you want the result as list, you can use list comprehension along with zip function.

>>> [(year, month, day) for year, month, day in zip(df.Year, df.Month, df.Day)]
[(2021, 9, 17), (2021, 9, 18), (2021, 9, 19), (2021, 9, 20)]

CodePudding user response:

Separate each row as follows three new columns: year, month, day

import pandas as pd

df = pd.DataFrame({"date":[20210901,20210902, 20210903, 20210904]})

# split date field to three fields: year month day 
df["year"] = df["date"].apply(lambda x: str(x)[:4])
df["month"] = df["date"].apply(lambda x: str(x)[4:6])
df["day"] = df["date"].apply(lambda x: str(x)[6:])

print(df)

result is as following

# result is:
 date  year  month  day
0  20210901  2021    09  01
1  20210902  2021    09  02
2  20210903  2021    09  03
3  20210904  2021    09  04

CodePudding user response:

I created a simple function which will return a zipped list in the format you wanted

def convert_to_dates(df):
         dates = []
         for i,v in x.iterrows():
             dates.append(v.values[0])
         for i in range(len(dates)):
             dates[i] = str(dates[i])
         years = []
         months = []
         days = []
         for i in range(len(dates)):
             years.append(dates[i][0:4])
             months.append(dates[i][4:6])
             days.append(dates[i][6:8])
         return list(zip(years, months, days))

Call it by using convert_to_dates(x)

Output:

In [4]: convert_to_dates(x)
Out[4]:
[('2021', '09', '01'),
 ('2021', '09', '02'),
 ('2021', '09', '03'),
 ('2021', '09', '04')]

CodePudding user response:

x = pd.DataFrame([20210917, 20210918, 20210919, 20210920])
x[0] = x[0].astype(str).astype('datetime64')
x = x.apply(lambda a: [a[0].year, a[0].month, a[0].day],axis=1)
print(x)

Output:

               0
0  [2021, 9, 17]
1  [2021, 9, 18]
2  [2021, 9, 19]
3  [2021, 9, 20]

  • Related