Home > Enterprise >  Python dataframes - grouping series
Python dataframes - grouping series

Time:03-29

I'm trying to execute a filter in python, but I'm stuck at the end, when I need to group the resullt.

I have a json, which is this one: https://api.jsonbin.io/b/62300664a703bb67492bd3fc/3

And what I'm trying to do with it is filtering "apiFamily" searching for "payments-ted" or "payments-doc". If I find a match, I then must verify that the column "ApiEndpoints" has at least two endpoints in it.

My ultimate goal is to append both "apiFamily" in one row and all the ApiEndpoints" in another row. Something like this:

  "ApiFamily": [
   "payments-ted",
   "payments-doc"
  ]
  "ApiEndpoints": [
    "/ted",
    "/electronic-ted",
    "/phone-ted",
    "/banking-ted",
    "/shared-automated-teller-machines-ted"
    "/doc",
    "/electronic-doc",
    "/phone-doc",
    "/banking-doc",
    "/shared-automated-teller-machines-doc"
  ]

I have managed so achieve partial sucess, searching for a single condition:

#ApiFilter = df[(df['ApiFamily'] == 'payments-pix') & (rolesFilter['ApiEndpoints'].apply(lambda x: len(x)) >= 2)]

This obviously extracts only payments-pix which contains two or more ApiEndpoints.

Now I can manage to check both conditions, if I try this:

#ApiFilter = df[((df['ApiFamily'] == 'payments-ted') | (df['ApiFamily'] == 'payments-doc') &(df['ApiEndpoints'].apply(lambda x: len(x)) >= 2)]

I will get the correct rows, but it will obviously list the brand twice.

When I try to groupby the result, all I get is this:

TypeError: unhashable type: 'Series'

My doubt is: how to avoid this error? I assume I must do some sort of conversion of the columns that have multiple itens inside a row, but what is the best method?

CodePudding user response:

I have tried this solution , it is kind of round-about but gets the final result you want

First get the data into a dictionary object

>>> import requests
>>> url = 'https://api.jsonbin.io/b/62300664a703bb67492bd3fc/3'
>>> response = requests.get(url)
>>> d = response.json()

We just need the ApiFamily and ApiEndpoints into a new dictionary

>>> dNew = {}
>>> for item in d['data'] :
>>>    if item['ApiFamily'] in ['payments-ted','payments-doc']:
>>>        dNew[item['ApiFamily']] = item['ApiEndpoints']

Change dNew into a dataframe and transpose it.

>>> df1 = pd.DataFrame(dNew)
>>> df1 = df1.applymap ( lambda x : '\''   x   '\'')
>>> df2 = df1.transpose()

At this stage df2 looks like this -

>>> print(df2)

0                  1             2               3  \
payments-ted  '/ted'  '/electronic-ted'  '/phone-ted'  '/banking-ted'   
payments-doc  '/doc'  '/electronic-doc'  '/phone-doc'  '/banking-doc'   

                                                    4  
payments-ted  '/shared-automated-teller-machines-ted'  
payments-doc  '/shared-automated-teller-machines-doc'  

Now join all the columns using the comma symbol

>>> df2['final'] = df2.apply(  ','.join , axis=1)

Finally

>>> df2 = df2[['final']]
>>> print(df2)

              final
payments-ted  '/ted','/electronic-ted','/phone-ted','/bankin...
payments-doc  '/doc','/electronic-doc','/phone-doc','/bankin...
  • Related