I have a dataframe column that looks like this:
paths
0 ['/api/v2/clouds', '/api/v2/clouds/{cloud}']
1 ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]
2 ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]
3 ['/v3/attachments/{attachmentId}', '/v3/attachments]
4 '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]
I want to extract the versions
from the column in such a format:
My desired output is :
paths Path_Version
0 ['/api/v2/clouds', '/api/v2/clouds/{cloud}'] v2
1 ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists] v0.1
2 ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....] v2
3 ['/v3/attachments/{attachmentId}', '/v3/attachments] v3
4 ['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents] v0.1/v0.2/v0.3
I have tried this:
keywords = ['v1', 'v2', 'v3', 'v4', 'v1.0', 'v1.2', 'v1.1', 'v0.1', 'v0.2','v1.3', 'v1.4', 'v3.1', 'v3.2', '0.1.0', '3.1', 'v0.0.2', 'v0.0.3', 'v0.0.4', '1.0.0']
final_api['Path_Version'] = final_api['paths'].str.findall('(' '|'.join(keywords) ')')
But yields no result. I have looked at other codes as well, but none of them give me the desired output. I am struggling to figure this out, any help will be appreciated.
CodePudding user response:
No need for keywords, just use pandas.Series.str.findall
as you started to do:
df["Path_Version"]= (
df["paths"].str.findall(r"(v\d\.?\d?)")
.apply(lambda x: "/".join(set(x)))
)
# Output :
print(df.to_string())
paths Path_Version
0 ['/api/v2/clouds', '/api/v2/clouds/{cloud}'] v2
1 ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists] v0.1
2 ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....] v1
3 ['/v3/attachments/{attachmentId}', '/v3/attachments] v3
4 '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents] v0.2/v0.3/v0.1
CodePudding user response:
This seems like a good candidate for a regex:
import pandas as pd
import re
data = [
[['/api/v2/clouds', '/api/v2/clouds/{cloud}']],
[['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists']],
[['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}']],
[['/v3/attachments/{attachmentId}', '/v3/attachments']],
[['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents']]
]
df = pd.DataFrame(data, columns=['paths'])
ver = re.compile(r'/(v\d(\.\d)?)/')
def getver(row):
vsets = set()
for p in row:
chk = ver.search(p)
vsets.add( chk.group(1) )
return '/'.join(vsets)
df['Version'] = df.paths.apply(getver)
print(df)
Output:
paths Version
0 [/api/v2/clouds, /api/v2/clouds/{cloud}] v2
1 [/v0.1/book-lists/{type}/{date}, /v0.1/book-li... v0.1
2 [/v1/Video/Rooms, /v1/Video/Rooms/{RoomSid}] v1
3 [/v3/attachments/{attachmentId}, /v3/attachments] v3
4 [/v0.1/patrons, /v0.2/patrons, /v0.3/patrons/d... v0.2/v0.3/v0.1