I have a dataframe with date of births as
pl.DataFrame({'idx':[1,2,3,4,5,6],
'date_of_birth':['03/06/1990','3/06/1990','11/12/2000','01/02/2021','1/02/2021','3/06/1990']})
Here I would like to compare date of birth(Format: Month/Day/Year) of each row and tag yes if the months are equal such as 03 - 3, 01 -1.
There are dates as 03/06/1900, 3/06/1990, they are generally same. but here they are treated as different. How to figure out these kind scenarios ?
The expected output as:
CodePudding user response:
Let's convert date_of_birth
to datetime then check if the month is satisfied
import numpy as np
import pandas as pd
import polars as pl
s = np.where(pd.to_datetime(df['date_of_birth']).month.isin([3, 1]), 'Yes', 'No')
out = df.with_columns([
pl.lit(s).alias('flag')
])
print(out)
┌─────┬───────────────┬──────┐
│ idx ┆ date_of_birth ┆ flag │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═══════════════╪══════╡
│ 1 ┆ 03/06/1990 ┆ Yes │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 3/06/1990 ┆ Yes │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 11/12/2000 ┆ No │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 4 ┆ 01/02/2021 ┆ Yes │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 5 ┆ 1/02/2021 ┆ Yes │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 6 ┆ 3/06/1990 ┆ Yes │
└─────┴───────────────┴──────┘
CodePudding user response:
You can try out with transform
function to filter in specific column of df
to get suitable output
df['match']=df['date_of_birth'].transform(lambda x : 'yes' if x.split("/")[0] in ['03','3','1','01'] else 'no')
Output:
idx date_of_birth match
0 1 03/06/1990 yes
1 2 3/06/1990 yes
2 3 11/12/2000 no
3 4 01/02/2021 yes
4 5 1/02/2021 yes
5 6 3/06/1990 yes