Home > Enterprise >  How to join 2 tables with one of them containing multiple values in a single column
How to join 2 tables with one of them containing multiple values in a single column

Time:09-09

I have two tables users and interests which i'm trying to join. Inside users table i have columns as id, name, interest, etc. The interest column contain multiple values as "1,2,3". My second table interests have 2 columns id and name as:

id | name
-------------
1  | business  
2  | farming 
3  | fishing 

What i want to do is join interests table with users table so i get the following output:

users table:

id | name     | interest | interest_name 
----------------------------------------------
1  | username | "1,2"    | "business, farming"
2  | username | "2,3"    | " farming, fishing"

I wrote the following query to achieve this:

select users.*, interests.name as interest_name
  from users
  left join interests on users.interest = interests.id;

Results i got:

id | name     | interest | interest_name
----------------------------------------
1  | username | "1,2"    | "business"
2  | username | "2,3"    | " farming"

Problem:

I'm only getting the name of first values from interest column whereas i want all the values from interest column i have already tried using group_concat and find_in_set but getting the same results.

CodePudding user response:

In the case you cannot create an additional database table in order to normalize the data...

Here's a solution that creates an ad hoc, temporary user_interests table within the query.

SELECT users.id user_id, username, interests, interests.interest
FROM users
INNER JOIN (
  SELECT
    users.id user_id,
    (SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1)   0) ui_id /*  0 converts to number */
  FROM users
    INNER JOIN (SELECT id AS ui_id FROM interests) ui 
      ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
    INNER JOIN interests ON ui.ui_id = interests.id
) user_interests ON users.id = user_interests.user_id
INNER JOIN interests ON user_interests.ui_id = interests.id
ORDER BY user_id, ui_id;

Outputs:

user_id | username | interest_ids | interest
-------- ---------- -------------- ---------
1       | fred     | 3,4,8,6,10   | fishing
1       | fred     | 3,4,8,6,10   | sports
1       | fred     | 3,4,8,6,10   | religion
1       | fred     | 3,4,8,6,10   | science
1       | fred     | 3,4,8,6,10   | philanthropy
2       | joe      | 7,11,8,9     | art
2       | joe      | 7,11,8,9     | science
2       | joe      | 7,11,8,9     | politics
2       | joe      | 7,11,8,9     | cooking

As you can see...

SELECT
  users.id user_id,
  (SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1)   0) ui_id
FROM users
  INNER JOIN (SELECT id AS ui_id FROM interests) ui 
    ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
  INNER JOIN interests ON ui.ui_id = interests.id

...builds and populates the temporary table user_interests, and provides:

user_id | ui_id
-------- ------
1       | 3
1       | 4
1       | 6
1       | 8
1       | 10
2       | 7
2       | 8
2       | 9
2       | 11

Which is then INNER JOIN'ed between the users and interests tables.


Try it here: https://onecompiler.com/mysql/3yfggctau

-- create
CREATE TABLE users (
  id INT PRIMARY KEY,
  username VARCHAR(20),
  interests VARCHAR(20)
);

CREATE TABLE interests (
  id INT PRIMARY KEY,
  interest VARCHAR(20)
);

-- insert
INSERT INTO users VALUES (1, 'fred', '3,4,8,6,10'), (2, 'joe', '7,11,8,9');
INSERT INTO interests VALUES (1, 'business'), (2, 'farming'), (3, 'fishing'), (4, 'sports'), (5, 'technology'), (6, 'religion'), (7, 'art'), (8, 'science'), (9, 'politics'), (10, 'philanthropy'), (11, 'cooking');

-- select
SELECT users.id user_id, interests.interest
FROM users
INNER JOIN (
  SELECT
    users.id user_id,
    (SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1)   0) ui_id
  FROM users
    INNER JOIN (SELECT id AS ui_id FROM interests) ui 
      ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
    INNER JOIN interests ON ui.ui_id = interests.id
) user_interests ON users.id = user_interests.user_id
INNER JOIN interests ON user_interests.ui_id = interests.id
ORDER BY user_id, ui_id;

Inspired by Leon Straathof's and fthiella's answers to this SO question.

CodePudding user response:

Pull the interest column out of the users table and create a user_interests table that contains the user ids and interest ids:

user_id | interest_id
-------- ------------
1       | 1
1       | 2
2       | 2
2       | 3

Then join the users table to the user_interests table, and the user_interests table to the interests table:

SELECT users.username, interests.interest 
  FROM users 
  LEFT JOIN user_interests ON users.id = user_interests.user_id 
  LEFT JOIN interests ON user_interests.interest_id = interests.id 
  WHERE interest_id IS NOT NULL;

Outputs:

username | interest
--------- ---------
Clark    | business
Clark    | farming
Dave     | farming
Dave     | fishing

Then use your server programming language to compile the query results.

Try it here: https://onecompiler.com/mysql/3yfe5pp7x

-- create
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  username TEXT NOT NULL
);

CREATE TABLE user_interests (
  user_id INTEGER,
  interest_id INTEGER,
  UNIQUE KEY user_interests_constraint (user_id,interest_id)
);

CREATE TABLE interests (
  id INTEGER PRIMARY KEY,
  interest TEXT NOT NULL
);

-- insert
INSERT INTO users VALUES (1, 'Clark'), (2, 'Dave'), (3, 'Ava');
INSERT INTO interests VALUES (1, 'business'), (2, 'farming'), (3, 'fishing');
INSERT INTO user_interests VALUES (1, 1), (1, 2), (2, 2), (2, 3);

-- fetch 
SELECT users.username, interests.interest 
  FROM users 
  LEFT JOIN user_interests ON users.id = user_interests.user_id 
  LEFT JOIN interests ON user_interests.interest_id = interests.id 
  WHERE interest_id IS NOT NULL;
  • Related