Home > Software engineering >  Transpose a group of repeating columns in large horizontal dataframe into a new vertical dataframe u
Transpose a group of repeating columns in large horizontal dataframe into a new vertical dataframe u

Time:04-30

This question although may seem previously answered it is not. All transposing seem to relate to one column and pivoting the data in that column. I want to make a vertical table from a horizontal set of columns, for example:-

Take this example:-

MyPrimaryKey Insurer_Factor_1_Name Insurer_Factor_1_Code Insurer_Factor_1_Value Insurer_Factor_2_Name Insurer_Factor_2_Code Insurer_Factor_2_Value Insurer_Factor_[n]_Name Insurer_Factor_[n]_Code Insurer_Factor_[n]_Value
XX-ABCDEF-1234-ABCDEF123 Special SP1 2500 Awesome AW2 3500 ecetera etc 999999

[n] being any number of iterations

transforming it into a new vertical representation dataframe:-

MyPrimaryKey Insurer_Factor_ID Insurer_Factor_Name Insurer_Factor_Code Insurer_Factor_Value
XX-ABCDEF-1234-ABCDEF123 1 Special SP1 2500
XX-ABCDEF-1234-ABCDEF123 2 Awesome AW2 3500
XX-ABCDEF-1234-ABCDEF123 [n] ecetera etc 999999

There is also the possibility that the “Code” column may be missing and we only receive the name and value therefore requiring null to be added to the code column.

I've searched High and low for this, but there just doesn't seem to be anything out there?

Also there could be many rows in the first example...

CodePudding user response:

The reason you haven't found it is that there is not a magic trick to move a 'interstingly' designed table into a well designed table. You are going to have to hand code a query to either union the rows into your table, or select arrays that you then explode.

Sure you could probably write some code to generate the SQL that you want, but really they're isn't a good feature to magically translate this feature format into a row based format.

CodePudding user response:

In order of preference:

Revisit your decision to send multiple files: It sounds like it would save a lot of work if you just sent multiple files.

Change the column schema: Put a delimiter (every 4th column) into the column schema allowing us to see the rows. We can then suck the file in as rows. Using a delimiter.

Write your own custom datasource: You can use the existing text one as a example for how you can write your own, that cold interpret every 3 columns as a row.

Write a custom UDF that takes all columns as a parameter and returns an array of rows, that you then call explode on to turn them into rows. This will be slow so I give it to you as the final option.

  • Related