Home > database >  Python: compare values with same ids in the data frame and create a new column to indicate which val
Python: compare values with same ids in the data frame and create a new column to indicate which val

Time:01-10

[Hello, I have a data frame, I want to find out if the date is matching any values in module 1-5 date, if it does, it should return the module number , if not, it should return 0, like the highlighted column 'new'. How can I do this? ] (https://i.stack.imgur.com/aaik3.png)

I tried this way but it would compare with different ids, but I only want each row to compare with the row that have the same id.

def function(x):
    if x in df.module1_date.values:
        return 1
    if x in df.module2_date.values:
        return 2
    if x in df.module3_date.values:
        return 3
    if x in df.module4_date.values:
        return 4
    if x in df.module5_date.values:
        return 5
    else:
        return 0

CodePudding user response:

You can use if/else in a lambda:

import re

import pandas as pd

df["new"] = df.apply(
    lambda x: re.findall(r"\d ", x[x == x["date"]].index[1])[0]
    if len(x[x == x["date"]]) > 1 else 0, axis=1
)

CodePudding user response:

You can use a lambda function to iterate over the columns. You can grab the module's number based on its location in the dataframe after excluding date.

data = {'date':         ['2022-05-16', '2022-05-18', '2022-05-19', '2022-05-22'],
        'module1_date': ['2022-04-28', '2022-04-28', '2022-04-28', '2022-04-28'],
        'module2_date': ['2022-05-11', '2022-05-11', '2022-05-11', '2022-05-11'],
        'module3_date': ['2022-05-16', '2022-05-16', '2022-05-16', '2022-05-16'],
        'module4_date': ['2022-05-18', '2022-05-18', '2022-05-18', '2022-05-18'],
        'module5_date': ['2022-05-19', '2022-05-19', '2022-05-19', '2022-05-19']
       }

df = pd.DataFrame(data)

df['new'] = df.iloc[:, 1:].apply(lambda x: np.where(df['date'] == x, df.columns.get_loc(x.name), 0) ).max(axis=1)

Output:

         date module1_date module2_date  ... module4_date module5_date new
0  2022-05-16   2022-04-28   2022-05-11  ...   2022-05-18   2022-05-19   3
1  2022-05-18   2022-04-28   2022-05-11  ...   2022-05-18   2022-05-19   4
2  2022-05-19   2022-04-28   2022-05-11  ...   2022-05-18   2022-05-19   5
3  2022-05-22   2022-04-28   2022-05-11  ...   2022-05-18   2022-05-19   0
  • Related