Home > Back-end >  SQLite|Pandas|Python: Select rows that contain values in any column?
SQLite|Pandas|Python: Select rows that contain values in any column?

Time:06-24

I have an SQLite table with 13500 rows with the following SQL schema:

PRAGMA foreign_keys = false;

-- ----------------------------
-- Table structure for numbers
-- ----------------------------
DROP TABLE IF EXISTS "numbers";
CREATE TABLE "numbers" (
  "RowId" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  "Date" TEXT NOT NULL,
  "Hour" TEXT NOT NULL,
  "N1" INTEGER NOT NULL,
  "N2" INTEGER NOT NULL,
  "N3" INTEGER NOT NULL,
  "N4" INTEGER NOT NULL,
  "N5" INTEGER NOT NULL,
  "N6" INTEGER NOT NULL,
  "N7" INTEGER NOT NULL,
  "N8" INTEGER NOT NULL,
  "N9" INTEGER NOT NULL,
  "N10" INTEGER NOT NULL,
  "N11" INTEGER NOT NULL,
  "N12" INTEGER NOT NULL,
  "N13" INTEGER NOT NULL,
  "N14" INTEGER NOT NULL,
  "N15" INTEGER NOT NULL,
  "N16" INTEGER NOT NULL,
  "N17" INTEGER NOT NULL,
  "N18" INTEGER NOT NULL,
  "N19" INTEGER NOT NULL,
  "N20" INTEGER NOT NULL,
  UNIQUE ("RowId" ASC)
);

PRAGMA foreign_keys = true;

Each row contain non repeating numbers from 1 to 80, sorted in ascending order.

I want to select from this table only the rows that contain numbers only these numbers: 10,20,30,40,50,60,70,80 but not more than 3 of them (I mean EXACTLY 3 and not more and not less).

I did the following:

First step:

e.g. for selecting only the rows that contains ANY of these numbers on the column N1 I did this command:

SELECT * FROM numbers WHERE N1 IN (10,20,30,40,50,60,70,80);

Of course that this is giving to me rows with just one of these numbers but also rows with let's say 5 or even all these numbers which I do not want, I want exactly 3 of these numbers on ANY column.

Second step:

For selecting rows which contain any of these numbers on columns N1 and N2 we just run this command:

SELECT * FROM numbers WHERE N1 IN (10,20,30,40,50,60,70,80) AND N2 IN (10,20,30,40,50,60,70,80);

But this will give also columns with 2 or more (even all numbers) which I do not want because this is not exactly 3 of this numbers on any of this columns.

Third step:

Retrieving rows that contain any of these numbers on N1, N2 and N3 with this command:

SELECT * FROM numbers WHERE N1 IN (10,20,30,40,50,60,70,80) AND N2 IN (10,20,30,40,50,60,70,80) AND N3 IN (10,20,30,40,50,60,70,80);

This is almost good because of giving the rows with any 3 of these numbers but also gives rows that could have more than 3 of these numbers like 4, 5 or even all numbers which I don't need.

Also, one idea is to modify this command by adding AND NOT N4 IN (10,20,30,40,50,60,70,80) AND NOT N5 IN (10,20,30,40,50,60,70,80) and so on until reach the N20.

On the other hand, any of these numbers (10,20,30,40,50,60,70,80) could be on N1, N2,N3 but also in any given column like N1, N12, N18 and any other combination of columns which means I should create any possible combination of 3 columns taken from 20 columns in order to get what I need.

Is there any smarter way to do this?

Thank you in advance!

P.S.

  1. I have already read this which is somehow something I need but I want to avoid because of to many combinations (and also it is in the Java language section), this which is doing what I need (I think) but it is in Python and pandas not SQLite syntax and I think this one is the same but also in Python and pandas, also, keep in mind that the last two do not look for any possible combination but just for a give combination to look for in any given column which partially what I need.
  2. Also, If you can do it in Python and pandas it is very good too because I could use that too (so, I'm adding tags for these in order to be seen as well maybe there is someone which is looking for that solution too, if you don't mind).

CodePudding user response:

Here's an SQLite query that will give you the results you want. It creates a CTE of all the values of interest, then joins your numbers table to the CTE if any of the columns contain the value from the CTE, selecting only RowId values from numbers where the number of rows in the join is exactly 3 (using GROUP BY and HAVING) and then finally selecting all the data from the rows which match that criteria:

WITH CTE(n) AS (
  VALUES (10),(20),(30),(40),(50),(60),(70),(80)
),
rowids AS (
  SELECT RowId
  FROM numbers
  JOIN CTE ON n IN (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10,
                    n11, n12, n13, n14, n15, n16, n17, n18, n19, n20)
  GROUP BY RowId
  HAVING COUNT(*) = 3
)
SELECT n.*
FROM numbers n
JOIN rowids r ON n.RowId = r.RowId

I've made a small demo on db-fiddle.

  • Related