What I try to do is create a wp plugin to get related tags of a tag and order by intersection rate.
For example, in a WordPress site, data structure like below:
- 40 posts has tags
B
, - 200 posts has tags
C
, - 20 posts has tags
A, B
, - 40 posts has tags
A, C
Now, I want to get related tags of tag A
.
The formula of intersection rate:
INTERSECT_RATE = INTERSECT_COUNT / ALL_POST_COUNT_IN_A_TAG
So,
- Rate for B is
20 / 40 = 0.5
, 50% - Rate for C is
40 / 200 = 0.2
, 20%
The problem is I'm not very familiar with the complex query of mysql. What I have done so far:
function get_related_terms($args = []) {
$base_term_id = $args['term_id'];
$base_taxonomy = $args['taxonomy'];
$related_taxonomy = $args['rel_taxonomy'];
$post_type = $args['post_type'];
$number = $args['number'] ?? 20;
global $wpdb;
$key = $base_taxonomy . ':' . $related_taxonomy . ':' . $base_term_id;
if ($terms = wp_cache_get($key, 'related_terms')) {
return $terms;
}
$results = $wpdb->get_results($wpdb->prepare(
"SELECT
related.term_id,
COUNT(*) as rel_count
FROM
{$wpdb->prefix}terms related
INNER JOIN {$wpdb->prefix}term_taxonomy related_tax ON (related_tax.term_id = related.term_id)
INNER JOIN {$wpdb->prefix}term_relationships related_rel ON (related_tax.term_taxonomy_id = related_rel.term_taxonomy_id)
INNER JOIN {$wpdb->prefix}posts posts ON (related_rel.object_id = posts.ID)
INNER JOIN {$wpdb->prefix}term_relationships base_rel ON (posts.ID = base_rel.object_id)
INNER JOIN {$wpdb->prefix}term_taxonomy base_tax ON (base_rel.term_taxonomy_id = base_tax.term_taxonomy_id)
INNER JOIN {$wpdb->prefix}terms base ON (base.term_id = base_tax.term_id)
WHERE
related_tax.taxonomy = '%s'
AND base_tax.taxonomy = '%s'
AND posts.post_type = '%s'
AND posts.post_status = 'publish'
AND base.term_id = %d
AND related.term_id != base.term_id
GROUP BY related_tax.term_id
ORDER BY rel_count/related_rel.object_id DESC
LIMIT 0, %d",
$related_taxonomy,
$base_taxonomy,
$post_type,
$base_term_id,
$number
));
// var_dump($results);
$terms = array();
foreach ($results as $result) {
$term = get_term((int)$result->term_id, $related_taxonomy);
$term->rel_count = (int)$result->rel_count;
$terms[] = $term;
}
wp_cache_set($key, $terms, 'related_terms');
return $terms;
}
The query results for related tags are not bad, but I doubt that maybe I did something wrong due to my bad mysql skill level, or there is something should be improved for better relevance or performance.
Any help? Thanks first.
CodePudding user response:
Step 1: Create and populate a simple table with the tags. I'm going to insist on making this new table because I fear there will be a gnarly mess of joins considering how WP works. I have enough JOINs in what I am presenting to not have to worry about WP.)
CREATE TABLE tags (
post_id INT UNSIGNED NOT NULL,
tag VARCHAR(99) NOT NULL,
PRIMARY KEY(post_id, tag)
) ENGINE=InnoDB;
INSERT INTO tags
SELECT tr.post_id, tt.term
FROM term_relationships AS tr
JOIN term_taxonomy AS tt
...; ???
-- I don't think we need to touch wp_posts.
Step 2: Check that new table to see if its data makes sense:
-- Count number of posts for each tag:
SELECT tag, COUNT(*)
FROM tags
GROUP BY tag
ORDER BY ct DESC LIMIT 11;
-- Count number of posts for each pair of tags:
SELECT a.tag, b.tag, COUNT(DISTINCT post_id)
FROM tags AS a
JOIN tags AS b USING(post_id)
WHERE a.tag != b.tag
GROUP BY a.tag, b.tag
ORDER BY ct DESC LIMIT 11;
-- Note: that will deliberately duplicate things:
-- A,B and B,A
-- I expect step 3 to be simplified because of this.
Step 3: Get the "intersect rates":
Do you want all combinations? Just the highest? Lowest? For a given tag
?
-- intersect (assuming only A is 'flowers')
SELECT pairs.b_tag, c.tag,
pairs.ct, c.ct,
ROUND(100*pairs.ct/c.ct, 1) AS pct
FROM ( SELECT tag, COUNT(*) AS ct
FROM tags
WHERE tag = 'flowers'
GROUP BY tag
) AS c
JOIN ( SELECT a.tag AS a_tag,
b.tag AS b_tag,
COUNT(*) AS ct
FROM tags AS a
JOIN tags AS b USING(post_id)
WHERE a.tag != b.tag
AND a.tag = 'flowers'
GROUP BY a.tag, b.tag
) AS pairs
ON pairs.a_tag = c.tag
ORDER BY pairs.ct DESC
limit 4;
With some sample data that I had, I came up with
-------- --------- ----- ------ ------
| b_tag | tag | ct | ct | pct |
-------- --------- ----- ------ ------
| white | flowers | 124 | 1748 | 7.1 |
| purple | flowers | 114 | 1748 | 6.5 |
| red | flowers | 101 | 1748 | 5.8 |
| yellow | flowers | 90 | 1748 | 5.1 |
-------- --------- ----- ------ ------