I have a data frame and a key, key[1,2,3,4]
:
Animal Arm
1 2
1 4
1 3
1 3
1 1
1 1
I want to create a new column called response
based on the condition that if the arm values are in the key then response is equal to 1
else response is equal to 0
. However the trick is that it should be only for the first values only and any repetition of the arm value should yield a response
as 0
. Just like this :
Animal Arm Response
1 2 1
1 4 1
1 3 1
1 3 0
1 1 1
1 1 0
There can be only a maximum of 4 value having response
as 1
This is what i tried :
resp = []
for i in range(len(df3)):
for j in key:
if df['Arm'][i] == j:
resp.append(1)
break
else: resp.append(0)
df['Response'] = resp
but i dont know how to make only the first values of the key as 1 and any repition of the values as zero.
Can someone help?
CodePudding user response:
Use Series.isin
with DataFrame.duplicated
- per both columns for test duplicated values per Animal
and Arm
, in another words duplicated values of Arm
are tested per groups by Animal
I understand this logic from tag group-by
.
df['Response'] = (df['Arm'].isin(key) & ~df.duplicated(['Animal','Arm'])).astype(int)
print (df)
Animal Arm Response
0 1 2 1
1 1 4 1
2 1 3 1
3 1 3 0
4 1 1 1
5 1 1 0
Add data for see difference:
key = [1,2,3,4]
df['Response'] = (df['Arm'].isin(key) & ~df.duplicated(['Animal','Arm'])).astype(int)
print (df)
Animal Arm Response
0 1 2 1
1 1 4 1
2 1 3 1
3 1 3 0
4 1 1 1
5 1 1 0
6 2 2 1
7 2 4 1
8 2 3 1
9 2 3 0
10 2 1 1
11 2 1 0
CodePudding user response:
You can use isin
combined with duplicated
:
df['Response'] = (df['Arm'].isin(key)&~df['Arm'].duplicated()).astype(int)
Or:
df['Response'] = np.where(df['Arm'].isin(key)&~df['Arm'].duplicated(), 1, 0)
Output:
Animal Arm Response
0 1 2 1
1 1 4 1
2 1 3 1
3 1 3 0
4 1 1 1
5 1 1 0
CodePudding user response:
resp = []
respDone= []
for i in range(len(df)):
for j in key:
if df['Arm'][i] == j and df["Arm"][i] not in respDone:
resp.append(1)
respDone.append(df["Arm"][i])
break
else: resp.append(0)
df['Response'] = resp