Home > Back-end >  I am working with a Dataset, where few values in the Date Column are like ' 2 Months Ago',
I am working with a Dataset, where few values in the Date Column are like ' 2 Months Ago',

Time:10-28

This is the Column, how can we get it the same way as some other Columns are in the format of Month-Year

I am thinking of trying if condition, but is there any library or method which I don't know about can solve this?

CodePudding user response:

You can subtract actual month-year period with months from values with decimals with month(s) and assign back to DataFrames, for days convert values to timedeltas and subtract actual datetime by Series.rsub for subtract from right side:

print (df)
             col
0    28 days ago
1   4 months ago
2  11 months ago
3      Oct, 2021

now = pd.Timestamp('now')
per = now.to_period('m')
date = now.floor('d')

s = df['col'].str.extract('(\d )\s*month', expand=False).astype(float)
s1 = df['col'].str.extract('(\d )\s*day', expand=False).astype(float)

mask, mask1 = s.notna(), s1.notna()
df.loc[mask, 'col'] = s[mask].astype(int).rsub(per).dt.strftime('%b, %Y')
df.loc[mask1, 'col'] = pd.to_timedelta(s1[mask1], unit='d').rsub(date).dt.strftime('%b, %Y')
print (df)
         col
0  Sep, 2022
1  Jun, 2022
2  Nov, 2021
3  Oct, 2021

CodePudding user response:

Assuming this input:

            col
0  4 months ago
1     Oct, 2021
2  9 months ago

You can use:

# try to get a date:
s = pd.to_datetime(df['col'], errors='coerce')

# extract the month offset
offset = (df['col']
          .str.extract(r'(\d ) months? ago', expand=False)
          .fillna(0).astype(int)
         )

# if the date it NaT, replace by today - n months
df['date'] = s.fillna(pd.Timestamp('today').normalize()
                    - offset*pd.DateOffset(months=1))

If you want a Mon, Year format:

df['date2'] = df['col'].where(offset.eq(0),
                             (pd.Timestamp('today').normalize()
                             -offset*pd.DateOffset(months=1)
                             ).dt.strftime('%b, %Y')
                             )

output:

            col       date      date2
0  4 months ago 2022-06-28  Jun, 2022
1     Oct, 2021 2021-10-01  Oct, 2021
2  9 months ago 2022-01-28  Jan, 2022
  • Related