Home > Software design >  How to split the word "ActionAction-AdventureShooterStealth" into list of separate words?
How to split the word "ActionAction-AdventureShooterStealth" into list of separate words?

Time:12-29

Question: The genres column contains the genres that are present in the games. It has all the genres written together without any space or special characters. Whatever is the major genre of the game is given first, followed by the other genres. For better understanding refer to the table below. Game Genres A ActionComedyAdventure B AdventureComedy C NarrationShooting

In the above table, the major genres for the Games A, B and C are Action, Adventure and Narration respectively.

Your job is to extract the major genre for each game and store it in a new column and name the column as “Major Genre”. (Hint: All the genre name starts with uppercase).

I want to split the word "ActionAction-AdventureShooterStealth" into a list of words in the below format

['Action', 'Action-Adventure', 'Shooter', 'Stealth']

I tried the below approach but didn't work out

text = "ActionAction-AdventureShooterStealth"
res = text.split(',')
print(res)

CodePudding user response:

One way to do this is with re, where it will matches "Action" followed by zero or more occurrences of -[A-Za-z] i.e -, capital and lowercase characters.

import re

string = "ActionAction-AdventureShooterStealth"
pattern = r"Action(-[A-Za-z] )*"
string_list = re.findall(pattern, string)
print(string_list) 

Output:

['Action', 'Action-Adventure', 'Shooter', 'Stealth']

CodePudding user response:

This regex matches would work with your examples: r'[A-Z][a-z] (?:-[A-Z][a-z] )*'

That is:

  • [A-Z]: an uppercase letter
  • [a-z] : one or more lowercase letters
  • (?:-[A-Z][a-z] )*: zero or more of: a -, then an uppercase letter, followed by one or more lowercase letters

When using with re.findall, we use (?:...) instead of simply (...) to make it a non-matching capture, otherwise re.findall returns the matched capture groups instead of matches.

Demo:

pattern = re.compile(r'[A-Z][a-z] (?:-[A-Z][a-z] )*')
pattern.findall('ActionAction-AdventureShooterStealth')
# returns: ['Action', 'Action-Adventure', 'Shooter', 'Stealth']

pattern.findall('ActionAction')
# returns: ['Action', 'Action']

pattern.findall('Action-ShooterStealth')
# returns: ['Action-Shooter', 'Stealth']
  • Related