df_pm = dataset[["names","pop_mig"]].copy()
starring_letter = str(input("starring_letter:"))
-df_pm is the data frame.
I want to list the names that starring with starring_letter, then find which one of them have the highest pop_mig
value. pop_mig
is a column that includes integers.
trythis = df_pm[df_pm["names"]== starring_letter in df_pm["names"]][df_pm[df_pm["pop_mig"]==df_pm["pop_mig"].max()]]
it returned the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
example from part of a data frame:
names pop_mig
0 Afghanistan 38991266
1 Albania 2891797
2 Algeria 43861044
3 Angola 32859859
4 Antigua and Barbuda 97929
.. ... ...
196 Vietnam 97418579
197 Western Sahara 591757
expected output:
starring_letter = C
output1 = China
output2 = China's pop_mig value
CodePudding user response:
This should work
df = df[df.names.str.startswith(input_letter.title())].nlargest(n=1,columns = ['pop_mig'])
CodePudding user response:
If you plan is to eventually get any letter, compute a DataFrame of all the max values per letter:
df2 = (df
.sort_values(by='pop_mig', ascending=False)
.groupby(df['names'].str[0].rename('letter'))
.first()
)
output:
names pop_mig
letter
A Algeria 43861044
V Vietnam 97418579
W Western Sahara 591757
or, a bit more efficient:
df2 = df['pop_mig'].groupby(df['names'].str[0]).idxmax().rename_axis('letter').reset_index(name='idx')
df2 = df2.merge(df, left_on='idx', right_index=True).drop(columns='idx')
output:
letter names pop_mig
0 A Algeria 43861044
1 V Vietnam 97418579
2 W Western Sahara 591757
Else, for a computation per demand:
starting_letter = 'A'
idx = df.loc[df['names'].str.startswith(starting_letter), 'pop_mig'].idxmax()
out = df.loc[idx]
output:
names Algeria
pop_mig 43861044