Home > database >  Extraction of versions in paths pandas column
Extraction of versions in paths pandas column

Time:11-23

I have a dataframe column that looks like this:

                                             paths                    
0      ['/api/v2/clouds', '/api/v2/clouds/{cloud}']                      
1      ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]                
2      ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]                
3      ['/v3/attachments/{attachmentId}', '/v3/attachments]                
4      '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]      

I want to extract the versions from the column in such a format:

My desired output is :

                                          paths                    Path_Version 
0      ['/api/v2/clouds', '/api/v2/clouds/{cloud}']                      v2   
1      ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]             v0.1   
2      ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]              v2  
3      ['/v3/attachments/{attachmentId}', '/v3/attachments]              v3  
4      ['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]      v0.1/v0.2/v0.3 

I have tried this:

keywords = ['v1', 'v2', 'v3', 'v4', 'v1.0', 'v1.2', 'v1.1', 'v0.1', 'v0.2','v1.3', 'v1.4', 'v3.1', 'v3.2', '0.1.0', '3.1', 'v0.0.2', 'v0.0.3', 'v0.0.4', '1.0.0']
final_api['Path_Version'] = final_api['paths'].str.findall('('   '|'.join(keywords)   ')')

But yields no result. I have looked at other codes as well, but none of them give me the desired output. I am struggling to figure this out, any help will be appreciated.

CodePudding user response:

No need for keywords, just use pandas.Series.str.findall as you started to do:

df["Path_Version"]= (
                        df["paths"].str.findall(r"(v\d\.?\d?)")
                                   .apply(lambda x: "/".join(set(x)))
                    )

# Output :

print(df.to_string())
                                                          paths    Path_Version
0                  ['/api/v2/clouds', '/api/v2/clouds/{cloud}']              v2
1         ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]            v0.1
2          ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]              v1
3          ['/v3/attachments/{attachmentId}', '/v3/attachments]              v3
4  '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]  v0.2/v0.3/v0.1

CodePudding user response:

This seems like a good candidate for a regex:

import pandas as pd
import re

data = [
      [['/api/v2/clouds', '/api/v2/clouds/{cloud}']],
      [['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists']],
      [['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}']],
      [['/v3/attachments/{attachmentId}', '/v3/attachments']],
      [['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents']]
]

df = pd.DataFrame(data, columns=['paths'])

ver = re.compile(r'/(v\d(\.\d)?)/')
def getver(row):
    vsets = set()
    for p in row:
        chk = ver.search(p)
        vsets.add( chk.group(1) )
    return '/'.join(vsets)

df['Version'] = df.paths.apply(getver)
print(df)

Output:

                                               paths         Version
0           [/api/v2/clouds, /api/v2/clouds/{cloud}]              v2
1  [/v0.1/book-lists/{type}/{date}, /v0.1/book-li...            v0.1
2       [/v1/Video/Rooms, /v1/Video/Rooms/{RoomSid}]              v1
3  [/v3/attachments/{attachmentId}, /v3/attachments]              v3
4  [/v0.1/patrons, /v0.2/patrons, /v0.3/patrons/d...  v0.2/v0.3/v0.1
  • Related