Home > Software design >  Search get request parameter in multiple columns of a csv file
Search get request parameter in multiple columns of a csv file

Time:12-12

I have this flask API in which the user can do a get request with a name they input. The thing is, I want to be able to search for that name in two different columns but I am not sure how to do that, given that this does not work since flask says 'cannot index with multidimensional key':

data = self.data.loc[self.data[['name-english','name_greek']] == name_cap].to_dict()

This is the part I'm talking about:

class Search(Resource):
   def __init__(self):
       self.data = pd.read_csv('datacsv')

   def get(self, name):
       name_cap = name.capitalize()
       data = self.data.loc[self.data['name-english'] == name_cap].to_dict()
       # return data found in csv
       return jsonify({'message': data})

So I want to search in both those columns instead of just one.

CodePudding user response:

Seems that you have a problem in your pandasDataframe syntax, not in the Flask itself. You are probably getting this error from pandas:

ValueError: cannot index with multidimensional key

According to pandas documentation:

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • An alignable boolean Series. The index of the key will be aligned before masking.

  • An alignable Index. The Index of the returned selection will be the input.

  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

In you example you are giving self.data[['name-english','name_greek']] == name_cap as a parameter to loc, this will return another dataframe, not an array of True and False or a boolean Series.

To filter your dataframe based on multiple columns you can use bitwise operators (& and | for example):

df.loc[(df["A"] == 1) | (df["B"] == 1)]

Or using the implemented method isin():

Whether each element in the DataFrame is contained in values.

Returns: DataFrame DataFrame of booleans showing whether each element in the DataFrame is contained in values.

Alongside with any():

Return whether any element is True, potentially over an axis.

Returns: Series or DataFrame If level is specified, then, DataFrame is returned; otherwise, Series is returned.

This way you'll have your boolean series to pass as parameter to you .loc, as the example:

df.loc[ df.isin([1]).any(1)]

Also, something that always helps me a lot dealing with dataframes is using jupyter to test somethings first, I think it's faster and you can mess around more in the dataframe to discover new ways to do what you need.

  • Related