Home > Enterprise >  SQL - Get PDF items that do NOT have a matching PNG item. Or maybe get PDF items with no PNG item
SQL - Get PDF items that do NOT have a matching PNG item. Or maybe get PDF items with no PNG item

Time:09-16

I have a table of metadata about attachments. Sometimes the auto PDF conversion does not work and it does not create a PNG file. I want a query I can run to see which PDF files this is happening on so I can go fix it..

In this case I want the results:

EBE71C1C-643C-48F3-985C-9F369AC08D9F    FW_-[EXTER214627673.pdf
F19EC965-F283-4DFD-9C8C-C20E77987986    Maria-Kim-215001342.pdf

So the query to list ALL attachment metadata is

SELECT tf.id, tf.CreatedOn, a.FileName
FROM TrainingForm tf
INNER JOIN Attachment a ON tf.Id = a.FormId
ORDER BY CreatedOn DESC

and results are like..

EBE71C1C-643C-48F3-985C-9F369AC08D9F    FW_-[EXTER214627673.pdf
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    ABA-Conf.-211722906.pdf
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    ABA-Conf.-2117229061.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec213220321.pdf
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203211.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203212.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203213.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203214.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203215.png
2DAA9CD3-72C2-4534-A56B-3CE8F57EF61F    Tatham-Rec2132203216.png
297A98DF-E026-45D6-90B3-7935B17A267C    Marina-Con215227983.jpg
297A98DF-E026-45D6-90B3-7935B17A267C    Payment-Re210605370.pdf
297A98DF-E026-45D6-90B3-7935B17A267C    Payment-Re2106053701.png
297A98DF-E026-45D6-90B3-7935B17A267C    Payment-Re2106053702.png
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    Receipt215800891.pdf
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    Receipt2158008911.png
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    Receipt2158008912.png
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    Receipt2158008913.png
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    onepagethi215843940.pdf
5BB4D0AB-A8B9-436D-BDDB-00929A401F63    onepagethi2158439401.png
F19EC965-F283-4DFD-9C8C-C20E77987986    Maria-Kim-215001342.pdf
F19EC965-F283-4DFD-9C8C-C20E77987986    Kim-Receip210738424.pdf
F19EC965-F283-4DFD-9C8C-C20E77987986    Kim-Receip2107384241.png
3DFBBCED-D8A7-4598-9680-D9A199621D18    Watershed-212950831.pdf
3DFBBCED-D8A7-4598-9680-D9A199621D18    Watershed-2129508311.png
3DFBBCED-D8A7-4598-9680-D9A199621D18    Watershed-213024234.pdf
3DFBBCED-D8A7-4598-9680-D9A199621D18    Watershed-2130242341.png
3DFBBCED-D8A7-4598-9680-D9A199621D18    Watershed-2130242342.png

EDIT 1: There is a case where the rename back to original name is not working. It seems to be in the case where the appended number goes two digits. Here is the raw data.

8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery214213464.PDF
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134641.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346410.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346411.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346412.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346413.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346414.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346415.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery21421346416.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134642.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134643.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134644.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134645.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134646.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134647.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134648.png
8DF92926-20A9-4BE8-A11A-0BCB2B868A31    2021-07-27 16:42:13.247 eDiscovery2142134649.png


SELECT COUNT(tf.id) AS cnt
            ,SUBSTRING(REPLACE(a.FileName, '.png', ''), 0, LEN(REPLACE(a.FileName, '.png', ''))) AS [FileName]
        FROM TrainingForm tf
        INNER JOIN Attachment a ON tf.Id = a.FormId
        WHERE a.FileName LIKE '%.png'
            AND datediff(dd, tf.CreatedOn, getdate()) < 60
        GROUP BY tf.id          ,a.FileName 

And the results of this you see are TWO values not one because of the double digit.

1   eDiscovery214213464
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery2142134641
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464
1   eDiscovery214213464

CodePudding user response:

Assuming you always have .pdf data, but the .png may not convert as mentioned:

SELECT tf.id, tf.CreatedOn, a.FileName
FROM TrainingForm tf 
INNER JOIN Attachment a 
ON tf.Id = a.FormId
inner join
(
    --select the id and filename of items that have a COUNT < 2
    --ie. has a pdf but not a png
    SELECT x.id, x.FileName
    FROM
    (
        --union together groups of pdfs and pngs
        SELECT tf.id, REPLACE(a.FileName, '.pdf', '') AS [FileName]
        FROM TrainingForm tf
        INNER JOIN Attachment a 
        ON tf.Id = a.FormId
        where a.FileName like '%.pdf'
        group by tf.id, a.FileName
        UNION ALL
        select a.id, a.filename 
        from
        (
            SELECT b.[id], REPLACE(pdfs.[filename], '.pdf', '') AS [FileName] 
            FROM
            (
                SELECT tf.id
                FROM TrainingForm tf
                INNER JOIN Attachment a 
                ON tf.Id = a.FormId
                where a.FileName like '%.png'
                group by tf.id
            ) as b
            inner join Attachment as pdfs
            on b.[id] = pdfs.[Formid]
            where pdfs.[filename] like '%.pdf'
        ) as a
        inner join Attachment as f
        on f.Formid = a.id
        where CHARINDEX(REPLACE(a.filename, '.pdf', ''), f.filename) > 0 and f.filename like '%.png'
        group by a.id, a.filename
    ) as x
    group by x.id, x.FileName
    having COUNT(x.id) < 2
) as z
ON z.id = tf.id and a.FileName = z.FileName   '.pdf'
group by tf.id, tf.CreatedOn, a.FileName
ORDER BY tf.CreatedOn DESC
  • Related