I have a list of texts like this:
Something at the beginning
References
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.
Other References
1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end
'Other References' part is present in some texts, but not present in others. Also similar strings could appear anywhere in the texts.
I need regex to use in re.findall and return all strings after 'References' in a list of strings like this.
['Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.', 'Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.', 'Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.']
But ONLY after 'References' and NOT anywhere earlier or later in the text.
I have tried this regex
r = 'References\s*(\d [.].*[.])'
But it returns only first string occurrence and I need all
Could anybody please suggest a better regex pattern?
CodePudding user response:
You could use re.findall
, twice. The strategy below is to first match all reference blocks as separate strings. We then join all such strings together, and then use re.findall
to find all references.
inp = """Something at the beginning
References
1. Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia.
2. Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions.
3. Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.
Other References
1. Tarelli, E. (2003), “How to transfer responsibilities from expatriates to local nationals”.
2. Riusala, K. and Suutari, V. (2004), “International knowledge transfers through expatriates”.
3. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something at the end
12. Wallace, J. (2001), “The benefits of mentoring for female lawyers”.
Something else at the end"""
refs = re.findall(r'^References\n((?:\d \.\s*.*?\n) )', inp, flags=re.M)
data = ''.join(refs)
output = re.findall(r'\d \.\s*(.*?)\n', data)
print(output)
This prints:
[
'Ryff, C.D. (2014) Psychological Well-Being Revisited: Advances in the Science and Practice of Eudaimonia. ',
'Deci, E.L. & Ryan, R.M. (2002) Self-determination research: reflections and future directions. ',
'Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field.'
]