Home > Software design >  ArrayFormula to calculate previous rows
ArrayFormula to calculate previous rows

Time:07-23

I have 3 sheets:

  • Sheet1 - list of transactions (Account, Credit, Debit, Date)
  • Sheet2 - list of transactions (Account, Credit, Debit, Date)
  • Sheet3 (I plan to lock it) - combined list of transactions, sorted by Date

Sheet3 looks like: enter image description here

I need to add 1 more column to Sheet3 to count current balance for certain row to be like: enter image description here

I'm able to do this with formula:

=SUM(FILTER($B$2:$B$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2)) - SUM(FILTER($C$2:$C$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2))

But this one I need continuously drag down.

Question: Is there way convert this formula to ArrayFormula, to avoid dragging

Live example: https://docs.google.com/spreadsheets/d/1e3IJaiOSlkia6ce9UGgIL_sNR6FK5TOp-BNrSYwS8Lg/edit?usp=sharing

CodePudding user response:

In G2 on sheet 3 I entered

=ArrayFormula(if(A2:A="",,mmult((A2:A=transpose(A2:A))*(row(A2:A)>= TRANSPOSE(row(A2:A)))*(transpose(B2:B)-transpose(C2:C)),row(A2:A)^0)))

See if that works for you?

CodePudding user response:

In Sheet3 row 1, put your headers.

In Sheet3!A2, put

=sort({filter(Sheet1!A2:D,not(isblank(Sheet1!A2:A)));filter(Sheet2!A2:D,not(isblank(Sheet2!A2:A))),4,true)

In Sheet3!E2, put

=mmult(transpose(arrayformula(arrayformula(array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
                              *arrayformula(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
       arrayformula(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))

To see why, let's temporarily remove the array_constrain(...,counta(...),1) wrappings, which is meant to auto detect the last data row:

=mmult(transpose(arrayformula(arrayformula(A2:A9=transpose(A2:A9))
                 *arrayformula(row(A2:A9)<=transpose(row(A2:A9))))),
       arrayformula(B2:B9-C2:C9))

arrayformula(B2:B9-C2:C9) are the running sums of column B - column C (ie. credit - debit). It is a column vector with the length of your data size.

We want to, for each row, 1) filter this vector by comparison to column A (ie. account name) & 2) filter this vector by whether the running sums are below or above the row in question.

arrayformula(A2:A9=transpose(A2:A9)) does 1). arrayformula(row(A2:A9)<=transpose(row(A2:A9))) does 2).

We want elementwise product between the 2 matrices in order to compose the filter. Hence, arrayformula(...*...).

The columns of our filters are meant to be applied to the running sums. To use matrix multiplication, we can keep the column vector of running sums as the post-multiplier; and transpose the filter matrix as pre-multiplier so that the rows of the transposed matrix are multiplied (ie. applied) to the running sums. Hence, mmult(transpose(...),...).

Add back the array_constrain trick. And we are done.

Feel free to experiment with alternate placings of arrayformula. But remember to keep the () brackets wherever you omit arrayformula. Example:

=arrayformula(mmult(transpose(((array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
                                *(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
                    (array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))))

Nonetheless, the 1 formula solution is computationally inefficient compared to individually spread formula per cell. That is because, without mutating the formula per row, we are forced to compute the filters as full n-by-n matrices where n is your data size.

Whereas, if in E2 we put =sum(filter(B$2:B2-C$2:C2,A$2:A2=A2)) and spread to the end by double right-clicking the square on bottom right when you select E2, the formula mutates per row, saving the row index comparison entirely, and also cutting the comparison to column A logarithmically.

Granted, we probably shouldn't rely on Google Sheet for a large database (e.g. >100k entries). But even for thousands of entries, if you square the amount of computations required, getting the results in browser becomes impractically slow well before one may expect.

  • Related