How to extract text from a string with Python using indexing and slicing-CodePudding

I am trying to extract certain character from a variable using Python Indexing an Slicing. I have the following variable

myvar = 
'https://mystorageacct.blob.core.windows.net/testcontainer/29112013 FDD_Exec Summary 29 Nov 2013.pdf'

I am trying to extract

29112013 FDD_Exec Summary 29 Nov 2013

I have tried such indexing as

grab = myvar[:10] but the result doesn't give me 29112013 FDD_Exec Summary 29 Nov 2013.

Any thoughts?

CodePudding user response：

I might suggest using str.rpartition so that you don't need to separately go through the work of figuring out which indices to slice on:

>>> myvar = 'https://mystorageacct.blob.core.windows.net/testcontainer/29112013 FDD_Exec Summary 29 Nov 2013.pdf'
>>> myvar.rpartition("/")[2]
'29112013 FDD_Exec Summary 29 Nov 2013.pdf'
>>> myvar.rpartition("/")[2].rpartition(".")[0]
'29112013 FDD_Exec Summary 29 Nov 2013'

CodePudding user response：

I recommend using the built-in pathlib for anything having to do with filepaths. Seems to work fine for URLs:

import pathlib

filename = pathlib.Path(myvar).name

Output:

'29112013 FDD_Exec Summary 29 Nov 2013.pdf'

CodePudding user response：

If you want the last 10 characters, you could use:

myvar[-10:]

so: start from the 10th to last character and go to the last.

If this should be more general, you would look at your strings structure and split it, e.g. by spaces and then take the correct value

CodePudding user response：

myvar = 'https://mystorageacct.blob.core.windows.net/testcontainer/29112013 FDD_Exec Summary 29 Nov 2013.pdf'

# Split the string by '/' and you get a list of 
# ['https:', '', 'mystorageacct.blob.core.windows.net', 'testcontainer','29112013 FDD_Exec Summary 29 Nov 2013.pdf']

# [-1] index is to pick the last one
# .replace('.pdf','') is to remove the '.pdf'
extract = myvar.split('/')[-1].replace('.pdf','')
print(extract)

>>> 29112013 FDD_Exec Summary 29 Nov 2013