Home > Back-end >  How to replace NaT in pandas dataframe with a date using information from other columns
How to replace NaT in pandas dataframe with a date using information from other columns

Time:11-08

noob here.

I have a dataframe that looks like this:

start end start_year
NaT NaT 2020
NaT NaT 2021

and I want to fill in the NaT's with the first and last day of the year listed in the start_year column. So it would look like this:

start end start_year
2020-01-01 2020-12-31 2020
2021-01-01 2021-12-31 2021

I tried to fill in the NaTs in the 'end' column like this:

df2.loc[df2['start'].isnull() 
                & df2['end'].isnull()
                & df2['start_year'].notnull()
       , "end"] = dt.date(df2["start_year"], 12, 31)

but I get this error:

TypeError: cannot convert the series to <class 'int'>

When I look at just the start year column it says this:

Name: start_year, Length: 4213, dtype: int64

I also tried using

df2["start_year"].values

but that didn't help.

Apologies if I'm just being an idiot. I searched around on here and google but couldn't find an answer.

CodePudding user response:

For both columns start and end, keep the value if filled else fill it with first day (resp last day) of year

 df['start'] = df.apply(lambda x: x['start'] if pd.notna(x['start']) else pd.to_datetime(datetime(x['start_year'], 1, 1), format='%y-%m-%d'), axis = 1)
 df['end'] = df.apply(lambda x: x['end'] if pd.notna(x['end']) else pd.to_datetime(datetime(x['start_year'], 12, 31), format='%y-%m-%d'), axis = 1)

CodePudding user response:

Use:

#if necessary
#df['start'] = pd.to_datetime(df['start'])
#df['end'] = pd.to_datetime(df['end'])

#replace missing values by Year - first day and last day
df['start'] = df['start'].fillna(pd.to_datetime(df['start_year'],format='%Y'))
df['end'] = (df['end'].fillna(pd.to_datetime(df['start_year'].add(1), format='%Y') 
                              - pd.Timedelta('1 day')))

print (df)
       start        end  start_year
0 2020-01-01 2020-12-31        2020
1 2021-01-01 2021-12-31        2021

CodePudding user response:

df['start_year'].apply(pd.Period).dt.to_timestamp('A')

output:

0   2020-12-31
1   2021-12-31
Name: start_year, dtype: datetime64[ns]
  • Related