Loop through unique values in a column to check another column and create another column

I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.

For example: for each unique ID in df, check if the stats is A then create a result column:

df:

ID	Status
1	A
1	B
2	B
2	C

new_df:

ID	Status	Result A?
1	A	YES
1	B	NO

I started off with


def function(df):
  id = list(df['ID'].unique())
  status = ['A']
  for i in id:

If anyone can share some thoughts I would much appreciate.

CodePudding user response：

I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.

You can use numpy.where

df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
new_df = df.drop_duplicates(subset=["ID"]).copy()
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")

to get this:

   ID Status Result A?
0   1      A       YES
2   2      B        NO

CodePudding user response：

df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5:3}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C', 4: 'A', 5:'B'}})

index	ID	Status
0	1	A
1	1	B
2	2	B
3	2	C
4	2	A
5	3	B

df['Result A?'] = df['Status'].apply(lambda x: x=='A')
df['Result A?'] = df['ID'].map(df.groupby('ID')['Result A?'].any())
df['Result A?'] = df['Result A?'].map({True:'YES', False:'NO'})

index	ID	Status	Result A?
0	1	A	YES
1	1	B	YES
2	2	B	YES
3	2	C	YES
4	2	A	YES
5	3	B	NO