I have written some code to help with my GCSE revision (exams in the UK taken at age 16) which converts a string into just the first letter of every word but leaves everything else in tact. (i.e special characters at the ends of words, capitalisation, etc...)
For example:
If I input >>> "These are some words (now they're in brackets!)"
I would want it to output >>> "T a s w (n t i b!)"
I feel although there must be an easier way to do this than my string of similar "if" statements... For reference, I am reasonably new to python but I can't see to find an answer online. Thanks in advance!
Code:
line = input("What text would you like to memorise?\n")
words = line.split()
letters=''
spec_chars=[
'(',')',',','.','“','”','"',"‘","’","'",'!','¡','?','¿','…'
]
for word in words:
if word[0] in spec_chars:
if word[-1] in spec_chars:
if word[-2] in spec_chars:
if word[1] in spec_chars:
letters = word[0] word[1] word[2] word[-2] word[-1] " "
else:
letters = word[0] word[1] word[-2] word[-1] " "
else:
if word[1] in spec_chars:
letters = word[0] word[1] word[2] word[-1] " "
else:
letters = word[0] word[1] word[-1] " "
else:
if word[1] in spec_chars:
letters = word[0] word[1] word[2] " "
else:
letters = word[0] word[1] " "
else:
if word[-1] in spec_chars:
if word[-2] in spec_chars:
letters = word[0] word[-2] word[-1] " "
else:
letters = word[0] word[-1] " "
else:
letters = word[0] " "
output=("".join(letters))
print(output)
CodePudding user response:
Here's one alternative. We keep every punctuation except apostrophe, and we only keep the first letter encountered.
words = "These are some words (now they're in brackets!)"
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzé'"
output = []
for word in words.split():
output.append( '' )
found = False
for i in word:
if i in alphabet:
if not found:
found = True
output[-1] = i
else:
output[-1] = i
print(' '.join(output))
Output:
T a s w (n t i b!)
CodePudding user response:
This might be somewhat overwhelming for now, but I'd still like to point out a solution that allows for a much more concise solution using regular expressions, because it's quite instructional in terms of how to approach problems like this.
TL;DR: It can be done in one line
import re
' '.join(re.sub(r"(\w)[\w']*\w", r'\1', word) for word in text.split())
If you look at the words individually after using .split()
, it appears that what you need to do is basically remove all letters (and word-internal apostrophe) after the first letter occurring in each word.
[
'"These', # remove 'hese'
'are', # 're'
'some', # 'ome'
'words', # 'ords'
'(now', # 'ow'
"they're", # "hey're"
'in', # 'n'
'brackets!)"' # 'rackets'
]
Another way to think about it is to find sequences consisting of
- A letter
x
- A sequence of 1 or more letters
and replace the sequence with x
. E.g., in '"These'
, replace 'These'
with 'T'
. to arrive at '"T'
; in brackets!)"
, replace 'brackets'
with 'b'
, etc.
In regular expression syntax, this becomes:
(\w)
: A letter is matched by\w
, but we want to reference to it later, so we need to put it in a group - hence the parentheses.- A sequence of 1 or more (indicated by
\w
. We also want to include apostrophe, so we want a class indicated by[]
, i.e.,[\w']
, which means "match one or more instances of a letter or apostrophe".
To replace/substitute substrings matched by the pattern we use re.sub(pattern, replacement, string)
. In the replacement string we can tell it to insert the group we defined before by using the reference \1
.
Putting it all together:
# import the re module
import re
# define the regular expression
pattern = r"(\w)[\w'] "
# some test data
texts = ["\"These are some words (now they're in brackets!)\"",
"¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'",
"The kids' favourite teacher"]
# testing the pattern
for text in texts:
words = text.split()
print(text)
print(' '.join(re.sub(pattern, r'\1', word) for word in words))
print()
Result:
"These are some words (now they're in brackets!)"
"T a s w (n t i b!)"
¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'
¿Q e l m a? '(¡N e d!!)'
The kids' favourite teacher
T k f t
To include word-final apostrophe, modify the pattern to
pattern = r"(\w)[\w']*\w"
so that the letter-apostrophe sequence must end with a letter. In other words, we now match
- a group consisting of a letter
(\w)
, followed by - zero or more (indicated by
*
) instances of letter or apostrophe, and - a letter
\w
.
The result is exactly the same as above, except the last sentence becomes "T k' f t".
CodePudding user response:
Below code is working fine for me.
Here, I am just checking the left and right end of each word of the given sentence.
Let me know in case of any clarification.
words = "¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'"
spec_chars = ['(', ')', ',', '.', '“', '”', '"', "‘",
"’", "'", '!', '¡', '?', '¿', '…']
s_lst = words.split(' ')
tmp, rev_tmp = '', ''
for i in range(len(s_lst)):
for l in s_lst[i]:
if l in spec_chars:
tmp = l
else:
tmp = l
for j in s_lst[i][::-1]:
if j in spec_chars:
rev_tmp = j
else:
tmp = rev_tmp[::-1]
break
s_lst[i] = tmp
tmp = ''
rev_tmp = ''
break
print(' '.join(s_lst))