Multiple array_agg in sqlalchemy-CodePudding

I am working with postgres. I want to fetch multiple fields using array_agg in sqlalchemy. But I couldn't find examples of such use anywhere. I made my request. But I can't process the result of array_agg. I'd like to get a list of strings, or better yet a list of tuples.
It would also be nice to get rid of func.distinct, it's only needed because I can't write it like this: func.array_agg((Task.id, Task.user_id))

My query:

data = session.query(
    Status.id, func.array_agg(func.distinct(Task.id, Task.user_id), type_=TEXT)
).join(Task).group_by(Status.id).limit(5).all()

I got:

(100, '{"(91,1)","(92,1)","(93,1)","(94,1)"}')
(200, '{"(95,1)","(96,1)","(97,1)","(98,1)","(99,1)"}')

But I want:

(100, ["(91,1)","(92,1)","(93,1)","(94,1)"])
(200, ["(95,1)","(96,1)","(97,1)","(98,1)","(99,1)"])

Or better:

(100, [(91,1),(92,1),(93,1),(94,1)])
(200, [(95,1),(96,1),(97,1),(98,1),(99,1)])

I try also:

func.array_agg(func.distinct(Task.id, Task.user_id), type_=ARRAY(TEXT))

I got:

(100, ['{', '"', '(', '9', '1', ',', '1', ')', '"', ',', '"', '(', '9', '2', ',', '1', ')', '"', ',', '"', '(', '9', '3', ',', '1', ')', '"', ',', '"', '(', '9', '4', ',', '1', ')', '"', '}'])
(200, ['{', '"', '(', '9', '5', ',', '1', ')', '"', ',', '"', '(', '9', '6', ',', '1', ')', '"', ',', '"', '(', '9', '7', ',', '1', ')', '"', ',', '"', '(', '9', '8', ',', '1', ')', '"', ',', '"', '(', '9', '9', ',', '1', ')', '"', '}'])

CodePudding user response：

The problem here is that Postgresql's aaray_agg function is returning an array of unknown type; the default behaviour of the psycopg2 connector in this situation is to simply return the array literal as-is.

The bug report was raised in this bug report from 2016. SQLAlchemy's maintainer, SO user zzzeek proposed creating a custom type to handle this case. I have modified the solution slightly to convert the tuple elements to integers, and to work with v1.4:

import re
from sqlalchemy.types import TypeDecorator

class ArrayOfRecord(TypeDecorator):
    impl = sa.String
    # cache_ok = True seems to work, but I haven't tested extensively
    cache_ok = True

    def process_result_value(self, value, dialect):
        elems = re.match(r"^\{(\". ?\")*\}$", value).group(1)
        elems = [e for e in re.split(r'"(.*?)",?', elems) if e]
        return [tuple(
            map(int, re.findall(r'[^\(\),] ', e))
        ) for e in elems]

Using it like this:

with Session() as session:
    data = (
        session.query(
            Status.id,
            sa.func.array_agg(
                sa.func.ROW(Task.id, Task.user_id), type_=ArrayOfRecord
            ).label('agg')
        )
        .join(Task)
        .group_by(Status.id)
    )
    print()
    for row in data:
        print(row)
    print()

outputs

(100, [(91, 1), (92, 1), (93, 1), (94, 1)])
(200, [(95, 1), (96, 1), (97, 1), (98, 1)])

CodePudding user response：

probably not the faster way to do it, but you can try making a text-formating function like:

def txt_fmt(a:str):
    out=[]
    for v in a.replace("{","").replace("}","").split('","'): 
        t=list(v.replace('"',"").replace("(","").replace(")","").split(","))
        out =[tuple([int(j) for j in t])]
    return out

so:

a = '{"(91,1)","(92,1)","(93,1)","(94,1)"}'
print(txt_fmt(a))

will return:

[(91, 1), (92, 1), (93, 1), (94, 1)]