I have a dataframe where idividuals have some scores. The idea is to highlight the reference indididual (check) in red and the individuals with a lower score in green. Following similar problem on StackOverflow (Adding labels in x y scatter plot with seaborn), I was able to highlight the check in red. However, I failed to highlight in green the two individuals (id_11, id_17) with a lower score. I got the error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." Please, find below my code. Thank you in advance for your help.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(
{'Individual Name': ['id_1', 'check', 'id_3', 'id_4', 'id_5', 'id_6', 'id_7', 'id_8', 'id_9', 'id_10', 'id_11', 'id_12', 'id_13', 'id_14', 'id_15', 'id_16', 'id_17', 'id_18', 'id_19', 'id_20', 'id_21', 'id_22', 'id_23', 'id_24', 'id_25', 'id_26', 'id_27', 'id_28', 'id_29', 'id_30'],
'feature': [0.508723818, 0.438733637, 0.718100026, 0.506722786, 0.520924985, 0.69302915, 0.659499198, 0.547989555, 0.714309067, 0.617602669, 0.35364303, 0.534064345, 0.59011931, 0.488031738, 0.511025466, 0.655582175, 0.32029745, 0.594929278, 0.562511802, 0.571763799, 0.681324482, 0.40444921, 0.628999099, 0.497668065, 0.690914914, 0.530561335, 0.798924312, 0.671025127, 0.71243462, 0.539980784],
'score': [91.5, 89.75, 94.25, 91.75, 91.75, 93.5, 93.25, 92.25, 94.0, 93.0, 89.25, 92.0, 92.5, 91.5, 91.5, 93.5, 88.5, 92.25, 92.0, 93.25, 93.25, 90.25, 92.75, 90.75, 94.0, 92.0, 95.75, 93.75, 94.5, 92.0]})
fig, ax = plt.subplots()
sns.scatterplot(data=df, x='score', y='feature')
plt.text(x=df['score'][df['Individual Name'] == 'check'], y=df['feature'][df['Individual Name'] == 'check'], s='check', color='red')
score_of_check = df['score'][
df['Individual Name'] == 'check'] # reference value for highlighting idividuals that have a lower score
print(score_of_check)
# label points if score is lower than score_of_check
for x in df['score']:
if x < score_of_check:
print(x) # Even print generate the error
plt.text(x=df['score'], y=df['feature'], s=df['Individual Name'],
color='green') # Ultimately I would like to label the 2 materials, id_11 and id_17 in green
plt.show()
plt.close()
CodePudding user response:
Further to JohanC's comment, here is some code that makes it work. The key is to set up an index based off of the size (rows) of your dataframe. The if
was not comparing compatible data types - note the variable score_of_check
is a series and needs to be converted to a value for comparison. You also need to use your index to supply single element coordinates and labels to the plt.text
function, otherwise you are trying to assign the entire column each time you run it.
for ind in range(len(df)):
#print(ind)
if df['score'][ind] < score_of_check.values[0]:
print(ind) # Even print generate the error
plt.text(x=df['score'][ind], y=df['feature'][ind], s=df['Individual Name'][ind],
color='green') # Ultimately I would like to label the 2 materials, id_11 and id_17 in green