Home > Mobile >  Quickest way to delete text from a string in python based on a list? Maybe regex?
Quickest way to delete text from a string in python based on a list? Maybe regex?

Time:10-13

Hi I have a very long string that has the following structure:

"IF ( 
ISFILTERED ( Table1[Column_1] ), 
VAR ___f = FILTERS ( Table1[Column_1] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_1])
VAR ___d = CONCATENATEX ( ___t, Table1[Column_1], ", " )
VAR ___x = "Table1[Column_1] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "Table1[Column_2] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& IF (
...

It basically continues like that iterating through each column of each table in a schema. As you can imagine it this string will be very large for a lot of tables/columns.

I have a list of columns from specific tables like this:

['Table1[Column_1]', 'Table2[Column_4]', 'Table6[Column_22]']

These are the only columns I am interested in keeping in the string.

So I need to go through the string and remove the entire IF statement it relates too if the table/column is not in the list.

So based on the above example the expected output would be:

    "IF ( 
ISFILTERED ( Table1[Column_1] ), 
VAR ___f = FILTERS ( Table1[Column_1] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_1])
VAR ___d = CONCATENATEX ( ___t, Table1[Column_1], ", " )
VAR ___x = "Table1[Column_1] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& IF (
...

we just got rid of the second IF because Table1[Column_2] was not in the list.

Would regex be useful for this case? Or maybe I should iterate through the list and build a new string that just keeps the relevant parts.

I know it is best practise to show what you have attempted so far but I am not sure where to start with this it seems like it should be easy but I am having trouble.

Can anyone help me please?

Python solutions would be best as I know that more, but happy to investigate other methods if easier. I know there are regex tools online maybe I can just use one of those?

CodePudding user response:

Assuming string structure is constant. You can try this, but it depends on structure.

data = YOUR_STRING_FROM_QUESTION

# This is the delimeter, which will help us to split query on parts
prefix = '& IF (\n'

# define list of allowed tables
allowed_tables = ['Table1[Column_1]', 'Table2[Column_4]', 'Table6[Column_22]']

# split full string on parts by prefix
query_by_part = data.split(prefix)

# build clean list of query parts, where tables in allowed_tables
clean_query_parts = [part for part in query_by_part
                     if any(table in part for table in allowed_tables)]

# finally join list to string using prefix and print
print(prefix.join(clean_query_parts))

Input:

IF ( 
ISFILTERED ( Table1[Column_1] ), 
VAR ___f = FILTERS ( Table1[Column_1] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_1])
VAR ___d = CONCATENATEX ( ___t, Table1[Column_1], ", " )
VAR ___x = "Table1[Column_1] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "Table1[Column_2] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)

Output:

IF ( 
ISFILTERED ( Table1[Column_1] ), 
VAR ___f = FILTERS ( Table1[Column_1] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_1])
VAR ___d = CONCATENATEX ( ___t, Table1[Column_1], ", " )
VAR ___x = "Table1[Column_1] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)

CodePudding user response:

You can use jinja2 to easily generate the string from scratch in this way:

  1. Add a new file with the double extension 'txt.jinja', e.g., example.txt.jinja2 with the following code inside (this is your template) and save it in the same path of your script:
"
{%- for column in columns_to_add -%}
IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "{{column}} = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& 
{% endfor -%}
"
  1. Execute this Python script:
import os
import jinja2

TEMPLATE_NAME = "example.txt.jinja2"
LIST_OF_TABLES = [
    'Table1[Column_1]',
    'Table2[Column_4]',
    'Table6[Column_22]'
]

template = os.path.join(os.path.dirname(os.path.abspath(__file__)), "")
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader(template))
template2= jinja_env.get_template(TEMPLATE_NAME)
string = template2.render(columns_to_add=LIST_OF_TABLES)
output = template   TEMPLATE_NAME.split("\\")[-1].replace('.jinja2', '')
with open(output, "w ") as f:
    f.write(string)

This will create a new file (in the same path) called example.txt with your string:

"IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "Table1[Column_1] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& 
IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "Table2[Column_4] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& 
IF (
ISFILTERED ( Table1[Column_2] ), 
VAR ___f = FILTERS ( Table1[Column_2] ) 
VAR ___r = COUNTROWS ( ___f ) 
VAR ___t = TOPN ( MaxFilters, ___f, Table1[Column_2])
VAR ___d = CONCATENATEX ( ___t, Table1[Columnw_1], ", " )
VAR ___x = "Table6[Column_22] = " & ___d & IF(___r > MaxFilters, ", ... [" & ___r & " items selected]") & " " 
RETURN ___x & UNICHAR(13) & UNICHAR(10)
)
& 
"

This works fine no matter how big your list of desired columns is, and the best part is that you only have to keep this list updated.

  • Related