Home > Software design >  Split values from column based on the name of the column
Split values from column based on the name of the column

Time:11-24

I work with QGIS and PostgreSQL with PostGIS. I need help with dynamic queries for PostgreSQL.

Information is structured in tables that contain votes for parties, and other types of information like geographic area or election date.

Some columns contains values that have to be splitted among several parties. For example, we can have a column with name "PartyA_PartyB" and a value of 10, and it should be splitted 5 votes to PartyA and 5 votes to PartyB. Additionally we will have independent columns for PartyA and PartyB (separated), so we need to compute a column where we allocate the original PartyA PartyA_PartyB/2.

So for example for the given the tables “Election Results” and "Parties":

create table election_results ("Country" text,  "PartyA" text,  "PartyB" text,  "PartyC" text, "PartyA_PartyB" text);
insert into election_results
VALUES
  ('Argentina', 100, 10, 20, 2),
  ('Uruguay', 3, 5, 1, 0),
  ('Chile', 40, 200, 50, 10)
;


create table parties (party text);
insert into parties
VALUES
  ('PartyA'),
  ('PartyB'),
  ('PartyC'),
  ('PartyD'),
  ('PartyE')
;

I need to create a new table with a column where 'new' PartyA = PartyA PartyA_PartyB/2 and 'new' PartyB = PartyB PartyA_PartyB/2

So with previous data desired result is:

Country PartyA PartyB PartyC
Argentina 101 11 20
Uruguay 3 5 1
Chile 45 205 50

In all cases the special characters that separates the names to be splitted is '_'.

We can have n parties in the column names (for example PartyA_PartyB_PartyD_PartyE). Votes have to be splitted among the n parties.

With my limited understanding I think iterate over the columns could be a solution, look for the '_' character and recalculate.

CodePudding user response:

Note: Please store your values not as text but as a numeric type.

  1. Create your new table:

    CREATE TABLE parties (
         "Country" text,  
         "PartyA" numeric,  
         "PartyB" numeric,  
         "PartyC" numeric
    );
    
  2. Copy values for the "single" columns:

    INSERT INTO parties
    
    SELECT "Country", "PartyA", "PartyB", "PartyC"
    FROM election_results;
    
  3. Update the columns with a function

    SELECT * FROM split_and_update_parties();
    

The function could look like this:

CREATE OR REPLACE FUNCTION split_and_update_parties()
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
    i record;
    j text;
    n integer;
BEGIN 
  FOR i in 
     
      SELECT                                       
           column_name,                                  -- 1
           string_to_array(column_name, '_')             -- 2
       FROM information_schema.columns
       WHERE table_name   = 'election_results'
           AND column_name ~ 'Party'     
     
   LOOP
       n = cardinality(i.string_to_array);               -- 3
       IF n > 1 THEN
           FOREACH j in array i.string_to_array LOOP
            
               EXECUTE format('
         
                   UPDATE parties p                      -- 4
                   SET %I = p.%I   s.val / %s
                   FROM (
                       SELECT %I as val, "Country"
                       FROM election_results
                   ) s
                   WHERE p."Country" = s."Country"
          
               ', j, j, n, i.column_name);

           END LOOP;
       END IF;
   END LOOP;

END
$func$;

Explanation:

  1. Fetch column names from internal information schema
  2. Immediately split the names and convert them into arrays
  3. Count the elements of the arrays to know the divider needed furtherly in the calculation
  4. Loop through all these multiple-party-arrays/columns (with more than 1 element), fetch the original values from the election_results table and update the single-party-columns in the new table
  • Related