Home > Net >  Loop through unique values in a column to check another column and create another column - pandas
Loop through unique values in a column to check another column and create another column - pandas

Time:10-15

I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.

For example: for each unique ID in df, check if the stats is A then create a result column:

df:

ID Status
1 A
1 B
2 B
2 C

new_df:

ID Status Result A?
1 A YES
1 B NO

I started off with


def function(df):
  id = list(df['ID'].unique())
  status = ['A']
  for i in id:

If anyone can share some thoughts I would much appreciate.

CodePudding user response:

I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.

You can use numpy.where

df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
new_df = df.drop_duplicates(subset=["ID"]).copy()
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")

to get this:

   ID Status Result A?
0   1      A       YES
2   2      B        NO

CodePudding user response:

df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5:3}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C', 4: 'A', 5:'B'}})
index ID Status
0 1 A
1 1 B
2 2 B
3 2 C
4 2 A
5 3 B
df['Result A?'] = df['Status'].apply(lambda x: x=='A')
df['Result A?'] = df['ID'].map(df.groupby('ID')['Result A?'].any())
df['Result A?'] = df['Result A?'].map({True:'YES', False:'NO'})
index ID Status Result A?
0 1 A YES
1 1 B YES
2 2 B YES
3 2 C YES
4 2 A YES
5 3 B NO
  • Related