Summary of the problem:
I have a BigQuery SQL table formatted as such:
Source | Target | Weight |
---|---|---|
A | B | 1 |
A | C | 2 |
D | A | 3 |
E | A | 4 |
I am trying to rearrange the table so that, e.g., all the A
values will be moved to the Source
column and out of the Target
column, with the expected result:
Source | Target | Weight |
---|---|---|
A | B | 1 |
A | C | 2 |
A | D | 3 |
A | E | 4 |
I can do this for a single value, but my problem is, I want to shift the Source
and Target
values for multiple values.
In reality, the weights are Jaccard similarity coefficients between two sets. They were estimated in BigQuery, but because they were not estimated in a pairwise fashion given BigQuery's tabular format, values I'd expect to be in the Source
column appear in the Target
column.
What I've tried:
For a single value, e.g., A
only, I have the following solution:
WITH temp AS (
SELECT
*, "A" AS new_source
FROM networks.my_table
WHERE `Source`= "A" OR `Target` = "A"
ORDER BY jaccard DESC
),
output AS (
SELECT *,
CASE WHEN `Target` = new_source THEN `Source` ELSE `Target` END new_target
FROM temp
)
SELECT * FROM output
But I cannot seem to generalize this for all cases.
Alternatively, if there is a way to transform the data into a matrix before applying the Jaccard coefficient estimates, I would accept that solution as well.
CodePudding user response:
Consider below approach
select if(Target = 'A', struct(Target as Source, Source as Target, Weight), t).*
from your_table t
if applied to sample data in your question - output is