let user input a letter, find the names starring with that letter in the data frame, then find the m-CodePudding

df_pm = dataset[["names","pop_mig"]].copy()
starring_letter = str(input("starring_letter:"))

-df_pm is the data frame. I want to list the names that starring with starring_letter, then find which one of them have the highest pop_mig value. pop_mig is a column that includes integers.

trythis = df_pm[df_pm["names"]== starring_letter in df_pm["names"]][df_pm[df_pm["pop_mig"]==df_pm["pop_mig"].max()]]

it returned the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

example from part of a data frame:

                names     pop_mig
0            Afghanistan  38991266
1                Albania   2891797
2                Algeria  43861044
3                 Angola  32859859
4    Antigua and Barbuda     97929
..                   ...       ...
196              Vietnam  97418579
197       Western Sahara    591757

expected output:

starring_letter = C
output1 = China
output2 = China's pop_mig value

CodePudding user response：

This should work

df = df[df.names.str.startswith(input_letter.title())].nlargest(n=1,columns = ['pop_mig'])

CodePudding user response：

If you plan is to eventually get any letter, compute a DataFrame of all the max values per letter:

df2 = (df
 .sort_values(by='pop_mig', ascending=False)
 .groupby(df['names'].str[0].rename('letter'))
 .first()
)

output:

                 names   pop_mig
letter                          
A              Algeria  43861044
V              Vietnam  97418579
W       Western Sahara    591757

or, a bit more efficient:

df2 = df['pop_mig'].groupby(df['names'].str[0]).idxmax().rename_axis('letter').reset_index(name='idx')
df2 = df2.merge(df, left_on='idx', right_index=True).drop(columns='idx')

output:

  letter           names   pop_mig
0      A         Algeria  43861044
1      V         Vietnam  97418579
2      W  Western Sahara    591757

Else, for a computation per demand:

starting_letter = 'A'
idx = df.loc[df['names'].str.startswith(starting_letter), 'pop_mig'].idxmax()
out = df.loc[idx]

output:

names       Algeria
pop_mig    43861044