I have created the following pandas dataframe which is a shortened version of the dataframe I'm working with:
data = {'U.S. Custom Ports': ['Aberdeen, WA', 'Baltimore, MD'],
'Year': ['2017', '2018'],
'ImportTons': ['172,000.00', '180,000.00'],
'ExportTons': ['10,000.00', 'second_value'],
'CoastalName': ['Pacific', 'Atlantic'],
'City': ['Aberdeen', 'Baltimore'],
'State': ['WA', 'MD'],
'Difference': ['0.00', '73,000.00'],
}
df = pd.DataFrame(data)
df
which outputs this dataframe:
My objective is to then a write a function that takes two arguments (City, State) but
If the City does not appear in the dataframe, then the function should display the error message 'City does not exist.'
If the City appears but not with that State in the combo, then the function should display the error message 'Invalid Input.'
If the City and State combo appears in the dataframe, then the function should display the City, State, Year, ImportTons, ExportTons and Difference sorted by Year with the oldest year showing first.
my code:
def BestYears(City,State):
for index,row in df.iterrows():
combo = (row['City'],row['State'])
if (City != combo[0]):
print('City does not exist')
if (City == combo[0] and State != combo[1]):
print('Invalid Input')
else:
output_df = df[["City","State","Year","ExportTons","Difference"]].sort_values(by='Year')
return output_df
BestYears('Baltimore','jake')
BestYears('Baltimore','MD')
The output it produces:
As you can see, I pass the argument Baltimore in the first function call with the incorrect state "jake" which I had hoped to return "invalid input" but instead, it returns "City does not exist" along with the dataframe which is the last last condition in the function. On the second function call, I entered the correct arguments which would be 'Baltimore' and 'MD' which are both present in the dataframe but the output is also the same as the first function call. Could I get direction on what I might need to consider to get my function to work?
CodePudding user response:
Use boolean mask:
def BestYears(city, state):
m1 = df['City'] == city
m2 = df['State'] == state
if ~m1.any():
print(f"City '{city}' does not exist")
elif ~(m1 & m2).any():
print(f"Invalid Input: '{state}'")
else:
print(f"Found: '{city}, {state}'")
# do stuff here
Usage:
>>> BestYears('Boston', 'MA')
City 'Boston' does not exist
>>> BestYears('Baltimore', 'jake')
Invalid Input: 'jake'
>>> BestYears('Baltimore', 'MD')
Found: 'Baltimore, MD'