Home > Enterprise >  How can I replace the 'year' value in a datetime column for each row?
How can I replace the 'year' value in a datetime column for each row?

Time:04-07

Within my dataframe I have two columns: 'release_date' and 'release_year'

I am trying to replace the year value in each 'release_date' instance with the corresponding value in 'release_year'

I have tried the following

df.loc[:, 'release_date'] = df['release_date'].apply(lambda x: x.replace(x.year == df['release_year']))

however I am getting the error: 'value must be an integer, received <class 'pandas.core.series.Series'> for year'

Having checked the dtype, the release_date column is stored as datetime64[ns]

Excerpt from dataframe

CodePudding user response:

You need to use pandas.DataFrame.apply here rather than pandas.Series.apply as you need data from other column, consider following simple example

import datetime
import pandas as pd
df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)],'release_year':[2001,2002,2003]})
df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
print(df)

output

  release_date  release_year changed_date
0   1901-01-01          2001   2001-01-01
1   1902-01-01          2002   2002-01-01
2   1903-01-01          2003   2003-01-01

Note axis=1 which mean function is applied to each row and you got row (pandas.Series) as argument for that function

CodePudding user response:

casting to string then parsing to datetime is more efficient here; and also more readable if you ask me. Ex:

import datetime
import pandas as pd

N = 100000

df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)]*N,
                   'release_year':[2001,2002,2003]*N})

df['changed_date'] = pd.to_datetime(
        df['release_year'].astype(str)   df['release_date'].astype(str).str[5:],
        format="%Y%m-%d"
    )

df['changed_date']
Out[176]: 
0        2001-01-01
1        2002-01-01
2        2003-01-01
3        2001-01-01
4        2002-01-01
   
299995   2002-01-01
299996   2003-01-01
299997   2001-01-01
299998   2002-01-01
299999   2003-01-01
Name: changed_date, Length: 300000, dtype: datetime64[ns]
>>> %timeit df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
6.73 s ± 542 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit df['changed_date'] = pd.to_datetime(df['release_year'].astype(str) df['release_date'].astype(str).str[5:], format="%Y%m-%d")
651 ms ± 78.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Related