Home > Mobile >  Replacing spaces with underscores within each capture group in Vim using regex
Replacing spaces with underscores within each capture group in Vim using regex

Time:05-24

I often edit Power Query "M" code using Vim while working with Power BI. Typically I prefer to change the auto-generated identifier names for each query step by replacing spaces with underscores and shifting it to lowercase. Identifiers with spaces are represented as #-prefixed quoted strings such as #"Change Column Types". For that example, I would like every instance to be transformed to change_column_types instead. I would like to create a keymapping that can do this within any buffer for all instances.

An example file is shown below, followed by the desired 'cleaned' file:

Input:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"foo", type text}, {"bar", type text}, {"baz", Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([foo] = "sdf")),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"bar"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows"

Desired output:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    changed_type = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"foo", type text}, {"bar", type text}, {"baz", Int64.Type}}),
    filtered_rows = Table.SelectRows(changed_type, each ([foo] = "sdf")),
    grouped_rows = Table.Group(filtered_rows, {"bar"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    grouped_rows

The trick is that this involves trying to replace multiple (an unknown number of) characters within each capture group. Since there are only usually 2 or 3 words in each, I can handle these in a hacky way using two ex commands:

:%s/\v\#"(\w )\s(\w )"/\L\1_\L\2/gc
:%s/\v\#"(\w )\s(\w )\s(\w )"/\L\1_\L\2_\L\3/gc

However, this obviously does not seem ideal given that it is hardcoded to support only certain numbers of words. I have tried doing some nested grouping but the problem still seems to be the same. Is there a way to define a replacement pattern within some other replacement operation? Any help on how to properly handle this would be appreciated.

CodePudding user response:

The following creates a mapping for <space>-<c> to convert the strings to identifiers:

noremap <space>c :%s/#"\([^"]\ \)"/\=substitute(tolower(submatch(1)), " ", "_", "g")/g<CR>

A little explanation on what's happening here- the \= part of the replacement portion of the substitution signifies that we're going to evaluate this expression as vim script rather than treat it as literal text. We use submatch(1) to get the text of the first capture group, convert it to lowercase, and then perform a second substitution (spaces to underscores).


Note, there are a few edge cases this can snag on- if need be, I can edit the regex to be more complex (and potentially brittle) to handle these cases.

  • If strings can start with a single quote instead of double quote, this won't match them. I have no idea if that's an issue Power Query "M" would have.
  • If the strings contain escaped characters (backslashes), this won't handle that.
  • If the string starts with a number, this may not make a valid identifier in the target language / format (if its like most languages).
  • Related