Data frame has 1,050,000 rows.
Input: (a pandas dataframe column)
UserImage
https://play-lh.googleusercontent.com/a/AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ=mo
https://play-lh.googleusercontent.com/EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQqbFu7f7NbTK05lx80nuSijCz7sc3a277R67g
https://play-lh.googleusercontent.com/a-/AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E'
Output:
UserIDs
AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ
EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQqbFu7f7NbTK05lx80nuSijCz7sc3a277R67g
AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E
CodePudding user response:
This looks like a perfect use case for a regex:
df['UserIDs'] = df['UserImage'].str.extract('^.*/([^/=] )[^/]*$')
Or if you want to keep only alphanum -
:
df['UserIDs'] = df['UserImage'].str.extract('^.*/([-\w] )[^/]*$')
output:
UserImage \
0 https://play-lh.googleusercontent.com/a/AItbvm...
1 https://play-lh.googleusercontent.com/EGemoI2N...
2 https://play-lh.googleusercontent.com/a-/AFdZu...
UserIDs
0 AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ
1 EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQ...
2 AFdZucpr-V6JJAWHdTjxYVPa15fmQC7pWl5Xd5StFt1E
CodePudding user response:
IIUC use:
df['UserImage'] = df['UserImage'].str.split('/').str[-1].str.split('=').str[0]
print (df)
UserImage
0 AItbvmkI4RoZOTFftgRqwJ0QVl-OqLw0PXFRQsQmzPwayQ
1 EGemoI2NTXmTsBVtJqk8jxF9rh8ApRWfsIMQSt2uE4OcpQ...