I am trying to "fill in the blanks" with my table below.
I have created a PDF reader that spits out a JSON file that I have manipulated into the below format.
There is very little scope in changing source data, or using a secondary table as a helper, due to the fact that this is meant to handle new data not "seen" before.
What I need help with is getting the Class
and Group
columns filled in. By this I mean that the Class Column will always have a value in the last row, and I need this repeated (upwards) until it comes across a non-blank value in the column. It then needs to repeat this value, until it comes across the next non-blank and so on.
Similarly the Group Column needs the same solution but starting from the first row down.
I have tried LAG()
LEAD()
etc with default values, but it doesn't handle the multiple nulls
.
I also need the Group
column to show the class value when not blank.
I have had a look at cte's but not overly familiar with them and have gotten myself tied in knots today!
Any help is appreciated.
Current Data
ID, Class, Group, Total, Account
1, null, INCOME, null, Fencing
2, null, null, null, Crop
3, Net Income, null, null, Net Income
4, null, Farm Expenditure, null, Irrigation
5, null, null, null, electricity
6, Surplus, null, null, Surplus
7, null, GST, null, GST
8, Closing Balance, null, null, Closing Balance
What I want
ID, Class, Group, Total, Account
1, Net Income, INCOME, null, Fencing
2, Net Income, INCOME, null, Crop
3, Net Income, INCOME, null, Net Income
4, Surplus, Farm Expenditure, null, Irrigation
5, Surplus, Farm Expenditure, null, electricity
6, Surplus, Farm Expenditure, null, Surplus
7, Closing Balance, GST, null, GST
8, Closing Balance, GST, null, Closing Balance
CodePudding user response:
This gives you the output you want with the data you give. In your sample data there is only one "group" in each "class", but it looks like there could maybe be multiple groups per class? If that is the case, it will be a bit more complicated, but the principal will be the same.
CREATE TABLE #data (ID INT, Class VARCHAR(50), [Group] VARCHAR(50), Total INT, Account VARCHAR(50));
INSERT INTO #data(ID, Class, [Group], Total, Account) VALUES
(1, null, 'INCOME', null, 'Fencing'),
(2, null, null, null, 'Crop'),
(3, 'Net Income', null, null, 'Net Income'),
(4, null, 'Farm Expenditure', null, 'Irrigation'),
(5, null, null, null, 'electricity'),
(6, 'Surplus', null, null, 'Surplus'),
(7, null, 'GST', null, 'GST'),
(8, 'Closing Balance', null, null, 'Closing Balance');
-- Find the break points that signify the end of a Class
WITH breaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM #data
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM breaks
)
SELECT MAX(grp.Class) OVER (PARTITION BY grp.grp) AS Class,
MAX(grp.[Group]) OVER (PARTITION BY grp.grp) AS [Group],
grp.Total,
grp.Account
FROM grp
CodePudding user response:
Thanks To James, I have now tweaked his code to give the correct Group, as the class and group needed different logic.
-- Find the break points that signify the end of a Class
WITH classbreaks as(
SELECT IIF(Class IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- Find the break points that signify the end of a Group
grpbreaks as(
SELECT IIF([Group] IS NOT NULL, 1, 0) AS breakpoint, ID, Class, [Group], Total, Account
FROM [PedGroup_db].[dbo].[Cashflow]
),
-- count the breakpoints passed so each class will have a number we can group by
clss AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp,
ID,Class,[Group],Total,Account
FROM classbreaks
),
-- count the breakpoints passed so each group will have a number we can group by
grp AS (
SELECT ISNULL(SUM(breakpoint) OVER (ORDER BY ID),0) AS grp,
ID,Class,[Group],Total,Account
FROM grpbreaks
)
--join the two sub queries together on ID
SELECT MAX(clss.Class) OVER (PARTITION BY clss.grp) AS Class,
case when clss.Class = grp.Account Then grp.Account else MAX(grp.[Group]) OVER (PARTITION BY grp.grp) end AS [Group],
grp.Total,
grp.Account
FROM clss left join grp on grp.ID = clss.ID