Is there a way to simplify my deep string of "if" statements? None of them actually repeat-CodePudding

I have written some code to help with my GCSE revision (exams in the UK taken at age 16) which converts a string into just the first letter of every word but leaves everything else in tact. (i.e special characters at the ends of words, capitalisation, etc...)

For example:

If I input >>> "These are some words (now they're in brackets!)"

I would want it to output >>> "T a s w (n t i b!)"

I feel although there must be an easier way to do this than my string of similar "if" statements... For reference, I am reasonably new to python but I can't see to find an answer online. Thanks in advance!

Code:

    line = input("What text would you like to memorise?\n")
words = line.split()
letters=''

spec_chars=[
    '(',')',',','.','“','”','"',"‘","’","'",'!','¡','?','¿','…'
]

for word in words:
    if word[0] in spec_chars:
        if word[-1] in spec_chars:
            if word[-2] in spec_chars:
                if word[1] in spec_chars:
                    letters  = word[0]   word[1]   word[2]   word[-2]   word[-1]   " "
                else:
                    letters  = word[0]   word[1]   word[-2]   word[-1]   " "
            else:
                if word[1] in spec_chars:
                    letters  = word[0]   word[1]   word[2]   word[-1]   " "
                else:
                    letters  = word[0]   word[1]   word[-1]   " "
        else:
            if word[1] in spec_chars:
                letters  = word[0]   word[1]   word[2]   " "
            else:
                letters  = word[0]   word[1]   " "
    else:
        if word[-1] in spec_chars:
            if word[-2] in spec_chars:
                letters  = word[0]   word[-2]   word[-1]   " "
            else:
                letters  = word[0]   word[-1]   " "
        else:
            letters  = word[0]   " "

output=("".join(letters))
    print(output)

CodePudding user response：

Here's one alternative. We keep every punctuation except apostrophe, and we only keep the first letter encountered.

words = "These are some words (now they're in brackets!)"
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzé'"

output = []
for word in words.split():
    output.append( '' )
    found = False
    for i in word:
        if i in alphabet:
            if not found:
                found = True
                output[-1]  = i
        else:
            output[-1]  = i
print(' '.join(output))

Output:

T a s w (n t i b!)

CodePudding user response：

This might be somewhat overwhelming for now, but I'd still like to point out a solution that allows for a much more concise solution using regular expressions, because it's quite instructional in terms of how to approach problems like this.

TL;DR: It can be done in one line

import re
' '.join(re.sub(r"(\w)[\w']*\w", r'\1', word) for word in text.split())

If you look at the words individually after using .split(), it appears that what you need to do is basically remove all letters (and word-internal apostrophe) after the first letter occurring in each word.

[
 '"These', # remove 'hese'
 'are',    # 're'
 'some',   # 'ome'
 'words',  # 'ords'
 '(now',   # 'ow'
 "they're", # "hey're"
 'in',      # 'n'
 'brackets!)"' # 'rackets'
]

Another way to think about it is to find sequences consisting of

A letter x
A sequence of 1 or more letters

and replace the sequence with x. E.g., in '"These', replace 'These' with 'T'. to arrive at '"T'; in brackets!)", replace 'brackets' with 'b', etc.

In regular expression syntax, this becomes:

(\w): A letter is matched by \w, but we want to reference to it later, so we need to put it in a group - hence the parentheses.
A sequence of 1 or more (indicated by ) letters is \w . We also want to include apostrophe, so we want a class indicated by [], i.e., [\w'] , which means "match one or more instances of a letter or apostrophe".

To replace/substitute substrings matched by the pattern we use re.sub(pattern, replacement, string). In the replacement string we can tell it to insert the group we defined before by using the reference \1.

Putting it all together:

# import the re module
import re

# define the regular expression
pattern = r"(\w)[\w'] "

# some test data
texts = ["\"These are some words (now they're in brackets!)\"",
         "¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'",
         "The kids' favourite teacher"]

# testing the pattern
for text in texts:
    words = text.split() 
    print(text)
    print(' '.join(re.sub(pattern, r'\1', word) for word in words))
    print()

Result:

"These are some words (now they're in brackets!)"
"T a s w (n t i b!)"

¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'
¿Q e l m a? '(¡N e d!!)'

The kids' favourite teacher
T k f t

To include word-final apostrophe, modify the pattern to

pattern = r"(\w)[\w']*\w"

so that the letter-apostrophe sequence must end with a letter. In other words, we now match

a group consisting of a letter (\w), followed by
zero or more (indicated by *) instances of letter or apostrophe, and
a letter \w.

The result is exactly the same as above, except the last sentence becomes "T k' f t".

CodePudding user response：

Below code is working fine for me.

Here, I am just checking the left and right end of each word of the given sentence.

Let me know in case of any clarification.

words = "¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'"
spec_chars = ['(', ')', ',', '.', '“', '”', '"', "‘",
              "’", "'", '!', '¡', '?', '¿', '…']
s_lst = words.split(' ')
tmp, rev_tmp = '', ''
for i in range(len(s_lst)):
    for l in s_lst[i]:
        if l in spec_chars:
            tmp  = l
        else:
            tmp  = l
            for j in s_lst[i][::-1]:
                if j in spec_chars:
                    rev_tmp  = j
                else:
                    tmp  = rev_tmp[::-1]
                    break
            s_lst[i] = tmp
            tmp = ''
            rev_tmp = ''
            break
print(' '.join(s_lst))