Home > Net >  is there a way to write a function that will evaluate if the values the function arguments accepts a
is there a way to write a function that will evaluate if the values the function arguments accepts a

Time:03-07

I have created the following pandas dataframe which is a shortened version of the dataframe I'm working with:

data = {'U.S. Custom Ports':  ['Aberdeen, WA', 'Baltimore, MD'],
        'Year': ['2017', '2018'],
        'ImportTons': ['172,000.00', '180,000.00'],
        'ExportTons': ['10,000.00', 'second_value'],
        'CoastalName': ['Pacific', 'Atlantic'],
        'City': ['Aberdeen', 'Baltimore'],
        'State': ['WA', 'MD'],
        'Difference': ['0.00', '73,000.00'],
        }

df = pd.DataFrame(data)

df

which outputs this dataframe:

enter image description here

My objective is to then a write a function that takes two arguments (City, State) but

If the City does not appear in the dataframe, then the function should display the error message 'City does not exist.'

If the City appears but not with that State in the combo, then the function should display the error message 'Invalid Input.'

If the City and State combo appears in the dataframe, then the function should display the City, State, Year, ImportTons, ExportTons and Difference sorted by Year with the oldest year showing first.

my code:

def BestYears(City,State):

for index,row in df.iterrows():
    combo = (row['City'],row['State'])
    if (City != combo[0]):
        print('City does not exist')
    if (City == combo[0] and State != combo[1]):
        print('Invalid Input')
    else:    
        output_df = df[["City","State","Year","ExportTons","Difference"]].sort_values(by='Year')
        return output_df

BestYears('Baltimore','jake')
BestYears('Baltimore','MD')

The output it produces:

enter image description here

As you can see, I pass the argument Baltimore in the first function call with the incorrect state "jake" which I had hoped to return "invalid input" but instead, it returns "City does not exist" along with the dataframe which is the last last condition in the function. On the second function call, I entered the correct arguments which would be 'Baltimore' and 'MD' which are both present in the dataframe but the output is also the same as the first function call. Could I get direction on what I might need to consider to get my function to work?

CodePudding user response:

Use boolean mask:

def BestYears(city, state):
    m1 = df['City'] == city
    m2 = df['State'] == state

    if ~m1.any():
        print(f"City '{city}' does not exist")
    elif ~(m1 & m2).any():
        print(f"Invalid Input: '{state}'")
    else:
        print(f"Found: '{city}, {state}'")
        # do stuff here

Usage:

>>> BestYears('Boston', 'MA')
City 'Boston' does not exist

>>> BestYears('Baltimore', 'jake')
Invalid Input: 'jake'

>>> BestYears('Baltimore', 'MD')
Found: 'Baltimore, MD'
  • Related