I have a database of resume data in json format which I am trying to transform. One of the sections in each jsib is work_history, this is in the form of a json array i.e.
"work_experience":[
{
"job_title":"title",
"job_description":"description"
},
{
"job_title":"title",
"job_description":"description"
}
]
I am iterating over each resume (json file) and importing this data into a new table using dbt and postgreSQL with each element of the array being a new row with the associated metadata of the resume. Here is the code I used for this
select
json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') as we,
json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') -> 'job_title' as "name",
rjt.uuid as uuid
from raw_json_table rjt
The last thing that I need to do is add a column that lists the index that each job came from within its individual workexperience array i.e. if a job was the third element in the array it would have a 2 in the "source_location" column. How can I generate this index such that it starts at 0 for each new json file.
CodePudding user response:
Move the function to the FROM clause (where set-returning functions should be used). Then you can use with ordinality
which also returns a column that indicates the index inside the array
select w.experience as we,
w.experience ->> 'job_title' as "name",
w.experience ->> 'job_description' as "description",
w.idx as "index",
rjt.uuid as uuid
from raw_json_table rjt
left join json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') with ordinality
as w(experience, idx) on true
The left join
is necessary so that rows from raw_json_table
that don't contain array elements are still included.