Home > Net >  How to speed up dynamic columns with formulas in Power Query
How to speed up dynamic columns with formulas in Power Query

Time:12-04

The Question (How do I make it faster)

I have been playing around with Power Query in Excel for over a year now but for the first time, I have a query that takes 20 minutes to run.

I am sure there is something here I can learn!
While it does currently work I believe if it was well-written it would run much faster.

Data Structure

There are two databases here

  1. Database of Company (Aka attendees) - About 400 rows
Company Title
Rita Book
Paige Turner
Dee End
etc
  1. Database of Events - About 500 rows

    1. An Event can have many Company (Attendees). The database exports this as a comma-separated list in the column [#"Export CSV - Company"]
Event Title Export CSV - Company Date Year
Event 1 Rita Book, Dee End 1/1/2015 2015
Event 2 Paige Turner 2/1/2015 2015
Event 3 Dee End 3/1/2015 2015
Event 4 Rita Book, Paige Turner, Dee End 1/1/2016 2016
etc ... ... ...

Note that I also have a separate query called #"Company Event Count - 1 Years List" which is a list of all years that events have been run.

The Goal

For a visualization, I need to get the data into the following structure:

Company Title 2015 2016 etc
John Smith 10 20 ...
Jane Doe 5 14 ...
etc ... ... ...

The Code

I have done my best to comment on my code below. Feel free to ask any questions.

let
    // This is a function. It was the only way I could figure out how to use [Company Title] from #"Keep only names column" and "currentColumnTitleYearStr" from the dynamically created columns in the same scope
    count_table_year_company = (myTbl, yearStr, companyStr) => 
        Table.RowCount(
            Table.SelectRows(
                myTbl,
                each Text.Contains([#"Export CSV - Company"],  companyStr)
            )
        ),
        
                    

    Source = #"Company 1 - Loaded CSV From Folder",                             // Grab a list of all Company 
    #"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}),  // Keep only the [Company Title] field
    // Dynamically create columns for each year. Example Columns: [Company Title], [2015], [2016], [2017], etc
    #"Add Columns for each year" =
        List.Accumulate(
            #"Company Event Count - 1 Years List",                              // Get a table of all events
            #"Keep only names column", 
            (state, currentColumnTitleYearStr) => Table.AddColumn(
                state, 
                currentColumnTitleYearStr,  // The Year becomes the column title and is also used in filters
                let // I hoped that filting the table by Year at this point would mean it only has to do it once per column, instead of once per cell.
                    eventsThisYearTbl = Table.SelectRows(
                        #"Event 1 - Loaded CSV From Folder",
                        each ([Year] = Number.FromText(currentColumnTitleYearStr)) 
                    )
                in(
                    // Finally for each cell, calculate the count of events. E.g How many events did 'John Smith' attend in 2015
                    each count_table_year_company(eventsThisYearTbl, currentColumnTitleYearStr, [Company Title]) //CompanyTitleVar
                )
            )
        ),
    FinalStep = #"Add Columns for each year"
in
    FinalStep

My Theries

I believe one of a few things may be making it slow

  1. I am using "List.Accumulate(" to dynamically create a column for each year. While this does work I think it may be the wrong formula for the job. Especially because the state field which is like a running total of each cell must be a huge number.

  2. I worry that I have an 'each' where I dont need it but I cant seem to remove any. Its my understanding that every 'each' is effectively a nested loop so removing one may have a dramatic impact on performance.

In Conclusion

While it does currently work I know there is something for me to learn here.
Thank you so much any guidance or suggested readings you can provide :)

CodePudding user response:

Does this do what you want? Converts from left to right. If not please explain more clearly

let  Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SplitNames = Table.TransformColumns(Source,{{"Names", each Text.Split(_,", ")}}),
#"Expanded Names" = Table.ExpandListColumn(SplitNames, "Names"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Names",{"Event Title", "Date"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Count", each 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US")[Year]), "Year", "Count", List.Sum)
in  #"Pivoted Column"

enter image description here

CodePudding user response:

Try buffering your table.

Change this

#"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}),  // Keep only the [Company Title] field
 

To this

 #"Keep only names column" = Table.Buffer(Table.SelectColumns(Source,{"Company Title"})),  // Keep only the [Company Title] field
  • Related