I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.
For example: for each unique ID in df, check if the stats is A then create a result column:
df:
ID | Status |
---|---|
1 | A |
1 | B |
2 | B |
2 | C |
new_df:
ID | Status | Result A? |
---|---|---|
1 | A | YES |
1 | B | NO |
I started off with
def function(df):
id = list(df['ID'].unique())
status = ['A']
for i in id:
If anyone can share some thoughts I would much appreciate.
CodePudding user response:
I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.
You can use numpy.where
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
new_df = df.drop_duplicates(subset=["ID"]).copy()
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")
to get this:
ID Status Result A?
0 1 A YES
2 2 B NO
CodePudding user response:
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5:3}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C', 4: 'A', 5:'B'}})
index | ID | Status |
---|---|---|
0 | 1 | A |
1 | 1 | B |
2 | 2 | B |
3 | 2 | C |
4 | 2 | A |
5 | 3 | B |
df['Result A?'] = df['Status'].apply(lambda x: x=='A')
df['Result A?'] = df['ID'].map(df.groupby('ID')['Result A?'].any())
df['Result A?'] = df['Result A?'].map({True:'YES', False:'NO'})
index | ID | Status | Result A? |
---|---|---|---|
0 | 1 | A | YES |
1 | 1 | B | YES |
2 | 2 | B | YES |
3 | 2 | C | YES |
4 | 2 | A | YES |
5 | 3 | B | NO |