The Question (How do I make it faster)
I have been playing around with Power Query in Excel for over a year now but for the first time, I have a query that takes 20 minutes to run.
I am sure there is something here I can learn!
While it does currently work I believe if it was well-written it would run much faster.
Data Structure
There are two databases here
- Database of Company (Aka attendees) - About 400 rows
Company Title |
---|
Rita Book |
Paige Turner |
Dee End |
etc |
Database of Events - About 500 rows
- An Event can have many Company (Attendees). The database exports this as a comma-separated list in the column [#"Export CSV - Company"]
Event Title | Export CSV - Company | Date | Year |
---|---|---|---|
Event 1 | Rita Book, Dee End | 1/1/2015 | 2015 |
Event 2 | Paige Turner | 2/1/2015 | 2015 |
Event 3 | Dee End | 3/1/2015 | 2015 |
Event 4 | Rita Book, Paige Turner, Dee End | 1/1/2016 | 2016 |
etc | ... | ... | ... |
Note that I also have a separate query called #"Company Event Count - 1 Years List" which is a list of all years that events have been run.
The Goal
For a visualization, I need to get the data into the following structure:
Company Title | 2015 | 2016 | etc |
---|---|---|---|
John Smith | 10 | 20 | ... |
Jane Doe | 5 | 14 | ... |
etc | ... | ... | ... |
The Code
I have done my best to comment on my code below. Feel free to ask any questions.
let
// This is a function. It was the only way I could figure out how to use [Company Title] from #"Keep only names column" and "currentColumnTitleYearStr" from the dynamically created columns in the same scope
count_table_year_company = (myTbl, yearStr, companyStr) =>
Table.RowCount(
Table.SelectRows(
myTbl,
each Text.Contains([#"Export CSV - Company"], companyStr)
)
),
Source = #"Company 1 - Loaded CSV From Folder", // Grab a list of all Company
#"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}), // Keep only the [Company Title] field
// Dynamically create columns for each year. Example Columns: [Company Title], [2015], [2016], [2017], etc
#"Add Columns for each year" =
List.Accumulate(
#"Company Event Count - 1 Years List", // Get a table of all events
#"Keep only names column",
(state, currentColumnTitleYearStr) => Table.AddColumn(
state,
currentColumnTitleYearStr, // The Year becomes the column title and is also used in filters
let // I hoped that filting the table by Year at this point would mean it only has to do it once per column, instead of once per cell.
eventsThisYearTbl = Table.SelectRows(
#"Event 1 - Loaded CSV From Folder",
each ([Year] = Number.FromText(currentColumnTitleYearStr))
)
in(
// Finally for each cell, calculate the count of events. E.g How many events did 'John Smith' attend in 2015
each count_table_year_company(eventsThisYearTbl, currentColumnTitleYearStr, [Company Title]) //CompanyTitleVar
)
)
),
FinalStep = #"Add Columns for each year"
in
FinalStep
My Theries
I believe one of a few things may be making it slow
I am using "List.Accumulate(" to dynamically create a column for each year. While this does work I think it may be the wrong formula for the job. Especially because the state field which is like a running total of each cell must be a huge number.
I worry that I have an 'each' where I dont need it but I cant seem to remove any. Its my understanding that every 'each' is effectively a nested loop so removing one may have a dramatic impact on performance.
In Conclusion
While it does currently work I know there is something for me to learn here.
Thank you so much any guidance or suggested readings you can provide :)
CodePudding user response:
Does this do what you want? Converts from left to right. If not please explain more clearly
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SplitNames = Table.TransformColumns(Source,{{"Names", each Text.Split(_,", ")}}),
#"Expanded Names" = Table.ExpandListColumn(SplitNames, "Names"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Names",{"Event Title", "Date"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Count", each 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US")[Year]), "Year", "Count", List.Sum)
in #"Pivoted Column"
CodePudding user response:
Try buffering your table.
Change this
#"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}), // Keep only the [Company Title] field
To this
#"Keep only names column" = Table.Buffer(Table.SelectColumns(Source,{"Company Title"})), // Keep only the [Company Title] field