I am trying to automate the renaming of PDFs of scientific papers from one name pattern to another using python.
The name pattern the PDFs occur in looks like this:
Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal.
i.e. "LastName1, FirstLetterGivenName1., LastName2, FirstLeterGivenName2., [...]. (Year). Title. Journal."
The name pattern of this example should be renamed to looks like this:
Cresswell_K_2011_Implementing and adopting
i.e "LastName1_FirstLetterGivenName1_Year_First3LettersTitle"
Sadly I was unable to apply the solutions to similar problems to this specific one, as I am just starting to code.
CodePudding user response:
You can use regular expression, like this for example:
import re
s = "Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal."
p = re.compile(r'(?P<LastName1>[A-Za-z] ),\s (?P<GivenName1>[A-Za-z] )\.?,. \((?P<Year>\d )\)\.\s (?P<Title1>\w )\s(?P<Title2>\w )\s(?P<Title3>\w )')
m = p.search(s)
if m is not None:
d = m.groupdict()
result = d['LastName1'] '_' d['GivenName1'][0] '_' d['Year'] '_' d['Title1'] ' ' d['Title2'] ' ' d['Title3']
print(result)
this gives the output:
Cresswell_K_2011_Implementing and adopting