I'm trying to organise the following list: list of gaming genre
I want to separate the concatenated words, but seems like they won't take the uppercase words that represents acronyms (e.g. PVP, MMORPG, MOBA, DeFi) properly.
Currently, my regex code is as follows:
re.sub(r"(\w)([A-Z])", r"\1 \2", ele) for ele in genre_list
As you can see below, it sometimes it works and sometimes it doesn't:
['Collectible Open-World Virtual-World', 'Breeding Card PV P', 'Auto-Battler Breeding Strategy', 'Minigame Open-World Virtual-World', 'Action Simulation Sports', 'Adventure MM OStrategy', 'Adventure Casual Puzzle', 'Sports', 'Collectible Sci-Fi Virtual-World', 'Battle-Royalee Sports MO BA', 'Action PV PShooter', 'P VP Sci-Fi Tower-Defense', 'Action Battle-Royale', 'P VP Sci-Fi Shooter', 'Breeding Collectible Mining', 'Collectible De Fie Sports', 'Action Adventure Shooter', 'City-Building Collectible Simulation', 'Action Strategy', 'Adventure Open-World', 'Breeding Racing Sports', 'Open-World Virtual-World', 'Collectible Idle', 'Action Adventure', 'Card Collectible PV P', 'Battle-Royale Fantasy MO BA', 'City-Building', 'Building MM OStrategy', 'Adventure MM OR PG', 'Action Adventure Idle', 'M OB AR PG Strategy', 'M MO RP GStrategy', 'Card Collectible Idle', 'Open-World PV PR PG', 'De Fi MM OSpace', 'Collectible', 'Card Collectible PV P', 'Auto-Battler De Fi RP G', 'Adventure MM OOpen-World', 'Collectible Open-World Virtual-World', 'Collectible Idle RP G', 'Card Collectible PV P', 'Action Adventure PV P', 'Sci-Fi Shooter Survival', 'Action Strategy', 'Arcade Minigame', 'Breeding PV PRacing', 'M OB AP VP', 'Action Sports', 'P VP Space Turn-based', 'M MO Strategy Tower-Defense']
Can you help me which regex would do best on this? Or regex just doesn't work for this list? Thanks!
CodePudding user response:
The pattern you are looking for is ([A-Z][a-z] (?:-[A-Z][a-z] )*|[A-Z] $|[A-Z] (?=[A-Z]))
genre_list = ['CollectibleOpen-WorldVirtual-World', 'BreedingCardPVP', 'Auto-BattlerBreedingStrategy', 'MinigameOpen-WorldVirtual-World', 'ActionSimulationSports', 'AdventureMMOStrategy', 'AdventureCasualPuzzle', 'Sports', 'CollectibleSci-FiVirtual-World', 'Battle-RoyaleeSportsMOBA', 'ActionPVPShooter', 'PVPSci-FiTower-Defense', 'ActionBattle-Royale', 'PVPSci-FiShooter', 'BreedingCollectibleMining', 'CollectibleDeFieSports', 'ActionAdventureShooter', 'City-BuildingCollectibleSimulation', 'ActionStrategy', 'AdventureOpen-World', 'BreedingRacingSports', 'Open-WorldVirtual-World', 'CollectibleIdle', 'ActionAdventure', 'CardCollectiblePVP', 'Battle-RoyaleFantasyMOBA', 'City-Building', 'BuildingMMOStrategy', 'AdventureMMORPG', 'ActionAdventureIdle', 'MOBARPGStrategy', 'MMORPGStrategy', 'CardCollectibleIdle', 'Open-WorldPVPRPG', 'DeFiMMOSpace', 'Collectible', 'CardCollectiblePVP', 'Auto-BattlerDeFiRPG', 'AdventureMMOOpen-World', 'CollectibleOpen-WorldVirtual-World', 'CollectibleIdleRPG', 'CardCollectiblePVP', 'ActionAdventurePVP', 'Sci-FiShooterSurvival', 'ActionStrategy', 'ArcadeMinigame', 'BreedingPVPRacing', 'MOBAPVP', 'ActionSports', 'PVPSpaceTurn-based', 'MMOStrategyTower-Defense']
result = []
for genre in genre_list:
result.append(" ".join(re.findall(r"([A-Z][a-z] (?:-[A-Z][a-z] )*|[A-Z] $|[A-Z] (?=[A-Z]))", genre)))
Finally, the result
will be
['CollectibleOpen-WorldVirtual-World', 'BreedingCardPVP', 'Auto-BattlerBreedingStrategy', 'MinigameOpen-WorldVirtual-World', 'ActionSimulationSports', 'AdventureMMOStrategy', 'AdventureCasualPuzzle', 'Sports', 'CollectibleSci-FiVirtual-World', 'Battle-RoyaleeSportsMOBA', 'ActionPVPShooter', 'PVPSci-FiTower-Defense', 'ActionBattle-Royale', 'PVPSci-FiShooter', 'BreedingCollectibleMining', 'CollectibleDeFieSports', 'ActionAdventureShooter', 'City-BuildingCollectibleSimulation', 'ActionStrategy', 'AdventureOpen-World', 'BreedingRacingSports', 'Open-WorldVirtual-World', 'CollectibleIdle', 'ActionAdventure', 'CardCollectiblePVP', 'Battle-RoyaleFantasyMOBA', 'City-Building', 'BuildingMMOStrategy', 'AdventureMMORPG', 'ActionAdventureIdle', 'MOBARPGStrategy', 'MMORPGStrategy', 'CardCollectibleIdle', 'Open-WorldPVPRPG', 'DeFiMMOSpace', 'Collectible', 'CardCollectiblePVP', 'Auto-BattlerDeFiRPG', 'AdventureMMOOpen-World', 'CollectibleOpen-WorldVirtual-World', 'CollectibleIdleRPG', 'CardCollectiblePVP', 'ActionAdventurePVP', 'Sci-FiShooterSurvival', 'ActionStrategy', 'ArcadeMinigame', 'BreedingPVPRacing', 'MOBAPVP', 'ActionSports', 'PVPSpaceTurn-based', 'MMOStrategyTower-Defense']
CodePudding user response:
Editing based on the comments:
You just need to add a ' ' after the [A-Z] i.e. r"(\w)([A-Z] )"
.
This will match 1 or more capital letters.