I have a table
AppName | Values |
---|---|
JBX | Yes |
JBX | No |
JBX | Yes |
ABD | No |
ABD | No |
CBR | Yes |
LBA | No |
LBA | Yes |
And I want to update this table so that for each occurrence of AppName where at least one of the values in the "Values" column is 'Yes' then we update all of the values for that specific AppName to "Yes".
Updated Table:
AppName | Values |
---|---|
JBX | Yes |
JBX | Yes |
JBX | Yes |
ABD | No |
ABD | No |
CBR | Yes |
LBA | Yes |
LBA | Yes |
I tried
if df2['Values'] == 'Yes':
df2['Values'] == 'Yes' ```
But this throws a ```ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().``` for ```if df2['Values'] == 'Yes':```
CodePudding user response:
Approach using groupby.transform, placing results into a new column for easy comparison with original Values
column:
df['app_has_any_yes'] = (df.groupby('AppName')['Values']
.transform(lambda x: (x == 'Yes').any())
.replace({True: 'Yes', False: 'No'})
Result:
AppName | Values | app_has_any_yes |
---|---|---|
JBX | Yes | Yes |
JBX | No | Yes |
JBX | Yes | Yes |
ABD | No | No |
ABD | No | No |
CBR | Yes | Yes |
LBA | No | Yes |
LBA | Yes | Yes |
CodePudding user response:
You can try groupby
then apply
out = df.groupby('AppName').apply(lambda g: g.assign(Values=['Yes']*len(g) if g['Values'].eq('Yes').any() else ['No']*len(g)))
print(out)
AppName Values
0 JBX Yes
1 JBX Yes
2 JBX Yes
3 ABD No
4 ABD No
5 CBR Yes
6 LBA Yes
7 LBA Yes
CodePudding user response:
Conveniently, "Yes" is lexicographically sorted after "No", so Yes>No and you can do:
df['Values'] = df.groupby('AppName')['Values'].transform('max')
NB. this is working for this particular example, but you can use the same logic with other words, for example in German "Ja" (Yes) is before "Nein" (No), so you would use transform('min')
in this case
output:
AppName Values
0 JBX Yes
1 JBX Yes
2 JBX Yes
3 ABD No
4 ABD No
5 CBR Yes
6 LBA Yes
7 LBA Yes