Home > Software design >  count the number of times a combination of values occurs
count the number of times a combination of values occurs

Time:04-15

Dataset looking at the types of crime for a given city.

Incident ID Incident Code Incident Category Incident Subcategory Incident Description
618691 4134 Assault Simple Assault Battery
618691 15300 Offences Against The Family And Children Other Hate Crime (secondary only)
618701 7053 Vehicle Impounded Vehicle Impounded Vehicle, Impounded
618701 65010 Traffic Violation Arrest Traffic Violation Arrest Traffic Violation Arrest
618701 65050 Other Miscellaneous Other Driving While Under The Influence Of Alcohol
626010 5043 Burglary Burglary - Residential Burglary, Residence, Unlawful Entry
626010 6381 Larceny Theft Larceny Theft - Other Embezzlement from Dependent or Elder Adult by Caretaker
626010 7041 Recovered Vehicle Recovered Vehicle Vehicle, Recovered, Auto
626010 16650 Drug Offense Drug Violation Methamphetamine Offense

Each IncidentID has 2, 3, or 4 Incident Codes associated with it.

I want to be able to count the number of times each combination of 2, 3, or 4 Incident Codes appears in the entire dataset.

For example:

Incident Codes 4134, 15300: x amount of times

Incident Codes 7053, 65010, 65050: x amount of times

Incident Codes 5043, 6381, 7041, 16650: x amount of times

I apologize if I've given a poor explanation - this is my first post on SO and quite frankly I don't know how to best communicate this question.

I don't know what SQL code to run to get my answer. The closest I've come to finding an answer is this post, Select combination of two columns, and count occurrences of this combination, but it already has the data separated into two columns, which my data is not there.

My thought is to split the additional codes into other columns, but perhaps there is a way to avoid doing that by having the code run the calculation for me without it.

I appreciate any and all input you may be able to give!

CodePudding user response:

Let's suppose your table is named "TableX". I think this query should be near to what you need:

Select T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode, Count(1) AS AmountOfTimes
From TableX T1
Join TableX T2      ON T2.IncidentID = T1.IncidentID AND 
                       T2.IncidentCode <> T1.IncidentCode
Left Join TableX T3 ON T3.IncidentID = T1.IncidentID AND 
                       T3.IncidentCode <> T1.IncidentCode AND 
                       T3.IncidentCode <> T2.IncidentCode
Left Join TableX T4 ON T4.IncidentID = T1.IncidentID AND 
                       T4.IncidentCode <> T1.IncidentCode AND 
                       T4.IncidentCode <> T2.IncidentCode AND 
                       T4.IncidentCode <> T3.IncidentCode
Group By T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode

CodePudding user response:

You would probably be best to try and NOT get all 3 parts in one query and here is why. Lets say for example that one officer enters their data as codes 1, 2, 3. Another enters codes as 3, 1, 2, and yet another enters as 2, 3, 1. They are all the same "set" of codes just in different order. If you rely on just being the first being the same, you would be getting 3 different rows showing the same thing each with 1 count.

You would be better served by running 3 distinct queries with a WHERE and HAVING clause based on just the codes you are interested in the "set". Something simple like

select
      YT.IncidentID,
      count(*) HowMany
   from
      YourTable YT
   where
      YT.IncidentCode in ( 4134, 15300 )
   group by
      YT.IncidentID
   having
      count(*) = 2

This will return all incidents that have BOTH parts, even if the incident was associated with any 3rd and/or 4th additional codes in a given incident. Having the total records IS your count.

So, now, take your codes of interest ex: 1 & 2, and you have the possibility of 2 more incident codes per incident, and you add an additional 30 combinations of codes 3 & 4 into the mix. If you dont care about the others that may be "extra", it does not screw up your count on the precise piece(s) you are looking for.

Then, all you have to do to get your other "what if" scenario counts is change your IN clause once and the having to match the count. Since you are only filtering based on the specific codes in question, you only want those that have the same count regardless of extra incident codes per example stated.

      YT.IncidentCode in ( 7053, 65010, 65050 )
   group by
      YT.IncidentID
   having
      count(*) = 3



      YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
   group by
      YT.IncidentID
   having
      count(*) = 4

Now, if you only really care about the final count of each respectively, just wrap that up one more to get the count of rows returned such as

select
      count(*) NumberOfIncidents
   from
      ( select
              YT.IncidentID,
              count(*) HowMany
           from
              YourTable YT
           where
              YT.IncidentCode in ( 4134, 15300 )
           group by
              YT.IncidentID
           having
              count(*) = 2 ) PreQualified

Then, if you wanted to do this on some time period basis such as you have a given date of the incident, and you wanted to keep running the same query / counts, you could expand and do something like this by doing a UNION to each query.

select
      'Assault and Offenses against Family and Children' as Activity,
      count(*) NumberOfIncidents
   from
      ( select
              YT.IncidentID,
              count(*) HowMany
           from
              YourTable YT
           where
                  YT.IncidentCode in ( 4134, 15300 )
              AND WhateverDateFilters...
           group by
              YT.IncidentID
           having
              count(*) = 2 ) PreQualified
UNION
select
      'Vehicle Impound, Traffic Arrest, Other Misc' as Activity,
      count(*) NumberOfIncidents
   from
      ( select
              YT.IncidentID,
              count(*) HowMany
           from
              YourTable YT
           where
                  YT.IncidentCode in ( 7053, 65010, 65050 )
              AND WhateverDateFilters...
           group by
              YT.IncidentID
           having
              count(*) = 3 ) PreQualified
UNION
select
      'Burglary, Theft, Drugs and Vehicle Recovery' as Activity,
      count(*) NumberOfIncidents
   from
      ( select
              YT.IncidentID,
              count(*) HowMany
           from
              YourTable YT
           where
                  YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
              AND WhateverDateFilters...
           group by
              YT.IncidentID
           having
              count(*) = 4 ) PreQualified

Notice each query in the UNION returns the same number, and order of columns. So it will just return a list (in this case) of 3 rows with a description and count per category regardless of the physical order the incident codes were entered, even IF they were entered in the 3rd and 4th when only looking for 2 code possibilities.

Sometimes a generic query (as in the left-join sample) is ok, and nothing wrong with it, but ask yourself the flexibility and do you want to drill into each permutation just to get your final result numbers.

  •  Tags:  
  • sql
  • Related