[Hello, I have a data frame, I want to find out if the date is matching any values in module 1-5 date, if it does, it should return the module number , if not, it should return 0, like the highlighted column 'new'. How can I do this? ] (https://i.stack.imgur.com/aaik3.png)
I tried this way but it would compare with different ids, but I only want each row to compare with the row that have the same id.
def function(x):
if x in df.module1_date.values:
return 1
if x in df.module2_date.values:
return 2
if x in df.module3_date.values:
return 3
if x in df.module4_date.values:
return 4
if x in df.module5_date.values:
return 5
else:
return 0
CodePudding user response:
You can use if/else in a lambda:
import re
import pandas as pd
df["new"] = df.apply(
lambda x: re.findall(r"\d ", x[x == x["date"]].index[1])[0]
if len(x[x == x["date"]]) > 1 else 0, axis=1
)
CodePudding user response:
You can use a lambda function to iterate over the columns. You can grab the module's number based on its location in the dataframe after excluding date
.
data = {'date': ['2022-05-16', '2022-05-18', '2022-05-19', '2022-05-22'],
'module1_date': ['2022-04-28', '2022-04-28', '2022-04-28', '2022-04-28'],
'module2_date': ['2022-05-11', '2022-05-11', '2022-05-11', '2022-05-11'],
'module3_date': ['2022-05-16', '2022-05-16', '2022-05-16', '2022-05-16'],
'module4_date': ['2022-05-18', '2022-05-18', '2022-05-18', '2022-05-18'],
'module5_date': ['2022-05-19', '2022-05-19', '2022-05-19', '2022-05-19']
}
df = pd.DataFrame(data)
df['new'] = df.iloc[:, 1:].apply(lambda x: np.where(df['date'] == x, df.columns.get_loc(x.name), 0) ).max(axis=1)
Output:
date module1_date module2_date ... module4_date module5_date new
0 2022-05-16 2022-04-28 2022-05-11 ... 2022-05-18 2022-05-19 3
1 2022-05-18 2022-04-28 2022-05-11 ... 2022-05-18 2022-05-19 4
2 2022-05-19 2022-04-28 2022-05-11 ... 2022-05-18 2022-05-19 5
3 2022-05-22 2022-04-28 2022-05-11 ... 2022-05-18 2022-05-19 0