Adding space before capital letters in yaml file (flexget config)-CodePudding

I have some problems with the FlexGet Configuration.

I want to rename and move some movies.

Example

For example the movie "ElPatriota" (which currently is unable to rename) can not be found in TheMovieDataBase (tmdb) when searching for this title without spaces.

So I need to rename it first to "El Patriota" before I can look it up at tmdb and move it to his correct directory.

What I researched

I saw this function using a regular-expression but I don't know how to implement it on my config or if it's the correct solution for me.

re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'

FlexGet Config YAML

This is a part of the related config:

move movies:
    priority: 3
    template:
      - movies-metainfo
      - telegram
    filesystem:
      path: /downloads/
      recursive: yes
      retrieve: files
      regexp: '.*\.(avi|mkv|mp4)$'
    seen: local
    regexp:
      reject:
        - \b(duo|tri|quadri|tetra|penta)logy\b: {from: title}
        - s\d{2}(e\d{2,})?: {from: title} 
    require_field: 
      - tmdb_name
      - movie_name
    accept_all: yes
    tmdb_lookup:
      language: es
    set:
      title: "{{title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')}}"  
    move:
      to: "/media/Peliculas/"
      rename: "{{tmdb_name|replace('/','_')|replace(':',' -')|replace(',','')|replace('?','')}}"
      along:
        extensions:
          - sub
          - srt
        subdirs:
          - Subs
      clean_source: 50

CodePudding user response：

The solution is very close but im completely noob.... This is what happens when i put this

manipulate:
 - title:
     replace:
       regexp: '([a-z])([A-Z])'
       format: '\1 \2'

E lP at ri ot a[].w ww.n ew pc t.c om

And trying with:

manipulate:
 - title:
     replace:
       regexp: '(?=[A-Z])'
       format: ' '

I get

E l P a t r i o t a []. w w w. n e w p c t. c o m

The solution would be to only separate the upper case after a lower case but I don't know how to do it.

CodePudding user response：

Assumptions on construction of search-terms

From your comment I assume the file-name replacing step as input for the search is:

    set:
      title: "{{title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')}}"

So the different search-terms (set titles) are alternatives (separated by | like boolean OR):

title|replace('4K','[]')|replace('BD1080','[]')|replace('M1080','[]')

Regex as solution

Assume further that you can use a regular-expression to substitute the title. Then a regex-substitution adding a space between lower-case and upper-case letters will do:

Step	Value
Input	`ElPatriotaM1080.www.url.com.mkv`
Wanted	`El Patriota M1080.www.url.com.mkv`
Regex	substitute `([a-z])([A-Z])` by `\1 \2`
Output	`El Patriota M1080.www.url.com.mkv`

`Manipulate` and `replace` by regex

Appropriate seems the manipulate Plugin with action replace as sown in Example 4:

You can control how the regex hits are output using \1, \2, etc in format.
manipulate:
  - title:
      replace:            
        regexp: '(.*)/(.*)/(.*)'
        format: '\2.\1.\3'

⚠️ Caution: Regex matches are ignore-case by default Since the regex is case-sensitive (depends on different upper-case and lower-case characters), the default regex-flags of the manipulate replace-by-regex (IGNORE and UNICODE) must be disabled explicitly by surrounding the regex with disabled inline-flag i like (?-i:<regex>).

Config snippets

In this case it could look like separating the lower-case (first group ([a-z]) and insert by reference \1) from upper-case (second group ([A-Z]) and insert by reference \2) by a space between.

Additionally disabling the i we need to config: (?-i:([a-z])([A-Z])).

manipulate:
  - title:
      replace:            
        regexp: '(?-i:([a-z])([A-Z]))'
        format: '\1 \2'

or alternatively, without capturing but with a positive look-ahead as (?=[A-Z]) then inserting a space (with switched-off ignore-case flag):

manipulate:
  - title:
      replace:            
        regexp: '(?-i:(?=[A-Z]))'
        format: ' '

Demo in pure Python

A working demo in pure Python shows how to replace file-names. It was adapted from How to replace camelCasing in all files in a folder using Python or c#?:

import re

old_name = 'ElPatriotaM1080.www.url.com.mkv'
print(f"Given:           '{old_name}'")

flags=re.I  # default for FlexGet's replace-plugin: ignore-case

regex_1           = '(?=[A-Z])'
regex_1_no_ignore = '(?-i:(?=[A-Z]))'

new_name = re.sub(regex_1, ' ', old_name, flags=flags)
print(f"Regex 1 (I on ): '{new_name}'")
new_name = re.sub(regex_1_no_ignore, ' ', old_name, flags=flags)
print(f"Regex 1 (I off): '{new_name}'")


regex_2           = r'([a-z])([A-Z])'
regex_2_no_ignore = r'(?-i:([a-z])([A-Z]))'

new_name = re.sub(regex_2, r'\1 \2', old_name, flags=flags)
print(f"Regex 2 (I on ): '{new_name}'")
new_name = re.sub(regex_2_no_ignore, r'\1 \2', old_name, flags=flags)
print(f"Regex 2 (I off): '{new_name}'")

Prints:

Given:           'ElPatriotaM1080.www.url.com.mkv'
Regex 1 (I on ): ' E l P a t r i o t a M1080. w w w. u r l. c o m. m k v'
Regex 1 (I off): ' El Patriota M1080.www.url.com.mkv'
Regex 2 (I on ): 'E lP at ri ot aM1080.w ww.u rl.c om.m kv'
Regex 2 (I off): 'El Patriota M1080.www.url.com.mkv'

Both regex-approaches (1 2) have almost the same effect: space inserted before upper-case letters. However, the ignore-case flag (whether "I on" or "I off") has unexpected impact on the result.