How to search a collection and return a list of sub document with mongo (Sping-data-mongo)-CodePudding

Given this Collection of Documents (Workflow):

[
{ 
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task1', value:'new'}
  {taskId: 'task2', value:'started'}
  {taskId: 'task3', value:'completed'}
 ]
},
{ 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task1', value:'new'}
  {taskId: 'task2', value:'started'}
  {taskId: 'task3', value:'completed'}
 ]
},
{ 
 id: 3,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task1', value:'new'}
  {taskId: 'task2', value:'started'}
  {taskId: 'task3', value:'completed'}
 ]
}
]

I already have a search function that return me a list (Page) of Workflows matching a bunch of criteria's using Query and mongoTemplate.find();

What i need to do is to transform this result into something like this: (lets pretend the query return all elements

[

 { 
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task1', value:'new'}
 ]
},
 { 
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task2', value:'started'}
 ]
},
 { 
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task3', value:'completed'}
 ]
},
{ 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task1', value:'new'}
 ]
},
{ 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  {taskId: 'task2', value:'started'}
 ]
},
.... etc
]

In other words, i would like to return a flatten version of my workflows with only 1 task per workflow. Pageable if possible!!

another version i could work with would be to return a list of tasks with the aggregated workflow object (parent) into an added field eg:

[
 {taskId: 'task1', value:'new', workflow: {the workflow object}},
 {taskId: 'task2', value:'started', workflow: {the workflow object}},
]

I played a bit with Aggregation and unwind etc. but im new to mongodb and i dont find examples that help me.

Thanks in advance!

CodePudding user response：

So I will try to answer using example code. I am using SpringTemplates rather than SpringRepositories. While repositories can do aggregations they are fundamentally too basic for most enterprise applications where templates have much more control. In my opinion I will only ever use templates and never use repositories - but this is just my opinion.

Keep in mind - SpringData wants to map POJOs to data in a MongoDB collection. The response from a query is easy because the two are in sync with each other - the POJO matches the expected structures found in the database. When performing an aggregation the results often are reshaped for a variety of reasons.

In your use case, it appears you want to unwind the field "tasks" and have only one task per higher level parent object. This means the parent fields will repeat - much like your expected output shown in your original post. When performing an unwind the array no longer exists but a single document is in its place. For this reason the output has a slightly different shape. For Spring, this means a different class (Inheritance can help here). For this reason, in my example code, I have two POJOs - one called Workflow that represents the original saved document shapes including an array for field tasks, and another POJO called Workflow2 that represents the reshaped aggregation results. The only difference is the field tasks. One has a List<Task> whereas the other has a Task sub object.

So, actually I have 3 POJOs:

Workflow
Workflow2
Task

Task is a class to define the sub documents in the field task. Whether or not its an array - it still needs a class to hold the two sub-document fields taskId and value.

I am using maven for dependency management. For extra clarity I fully qualify every object with no import statements.

So, without further adieu here is the code.

File pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.3.RELEASE</version>
        <relativePath/>
    </parent>
    <groupId>test.barry</groupId>
    <artifactId>test</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>test</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
        <start-class>test.barry.Main</start-class>
        <mongodb.version>4.3.4</mongodb.version> <!-- BARRY NOTE: FORCE SPRING-BOOT TO USE THE MONGODB DRIVER VERSION 4.4.0 INSTEAD OF 4.0.5 -->
    </properties>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-sync</artifactId>
            <version>4.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-mongodb</artifactId>
        </dependency>
    </dependencies>
</project>

File src/main/resources/application.properties

spring.data.mongodb.uri=mongodb://testuser:mysecret@localhost:50011,localhost:50012,localhost:50013/?replicaSet=replSet&w=majority&readConcernLevel=majority&readPreference=primary&authSource=admin&retryWrites=true&maxPoolSize=10&waitQueueTimeoutMS=1000
spring.data.mongodb.database=javaspringtestX
spring.data.mongodb.socketconnecttimeout=60

File src/main/java/test.barry/Main.java

package test.barry;

@org.springframework.boot.autoconfigure.SpringBootApplication
public class Main {
    public static void main(String[] args) {
        org.springframework.boot.SpringApplication.run(Main.class, args);
    }
}

File src/main/java/test.barry/MySpringBootApplication.java

package test.barry;

@org.springframework.boot.autoconfigure.SpringBootApplication
public class MySpringBootApplication implements org.springframework.boot.CommandLineRunner {

  @org.springframework.beans.factory.annotation.Autowired
  org.springframework.data.mongodb.core.MongoTemplate mongoTemplate;

  public static void main(String[] args) {
    org.springframework.boot.SpringApplication.run(org.springframework.boot.autoconfigure.SpringBootApplication.class, args);
  }

  @Override
  public void run(String... args) throws Exception {

    System.out.println("Drop collections for automatic cleanup during test:");
    System.out.println("-------------------------------");
    this.mongoTemplate.dropCollection(test.barry.models.Workflow.class);

    java.util.Calendar calendar = java.util.Calendar.getInstance();
    calendar.set(2021, 2, 10);

    test.barry.models.Workflow workflow1 = new test.barry.models.Workflow();
    workflow1.id = 1;
    workflow1.name  = "workflow";
    workflow1.status = "started";
    workflow1.createdDate = calendar.getTime();
    workflow1.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow1.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow1.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow1);

    test.barry.models.Workflow workflow2 = new test.barry.models.Workflow();
    workflow2.id = 2;
    workflow2.name  = "workflow";
    workflow2.status = "started";
    workflow2.createdDate = calendar.getTime();
    workflow2.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow2.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow2.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow2);

    test.barry.models.Workflow workflow3 = new test.barry.models.Workflow();
    workflow3.id = 3;
    workflow3.name  = "workflow";
    workflow3.status = "started";
    workflow3.createdDate = calendar.getTime();
    workflow3.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow3.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow3.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow3);

    org.springframework.data.mongodb.core.aggregation.Aggregation pipeline = org.springframework.data.mongodb.core.aggregation.Aggregation.newAggregation (
            org.springframework.data.mongodb.core.aggregation.Aggregation.unwind("tasks")
    );

    org.springframework.data.mongodb.core.aggregation.AggregationResults<test.barry.models.Workflow2> aggregationResults = this.mongoTemplate.aggregate(pipeline, test.barry.models.Workflow.class, test.barry.models.Workflow2.class);
    java.util.List<test.barry.models.Workflow2> listResults = aggregationResults.getMappedResults();
    System.out.println(listResults.size());
  }
}

File src/main/java/test.barry/SpringConfiguration.java

package test.barry;

@org.springframework.context.annotation.Configuration
@org.springframework.context.annotation.PropertySource("classpath:/application.properties")
public class SpringConfiguration {

    @org.springframework.beans.factory.annotation.Autowired
    org.springframework.core.env.Environment env;

    @org.springframework.context.annotation.Bean
     public com.mongodb.client.MongoClient mongoClient() {
         String uri = env.getProperty("spring.data.mongodb.uri");
         return com.mongodb.client.MongoClients.create(uri);
     }
    @org.springframework.context.annotation.Bean
    public org.springframework.data.mongodb.MongoDatabaseFactory mongoDatabaseFactory() {
        String uri = env.getProperty("spring.data.mongodb.uri");
        String database = env.getProperty("spring.data.mongodb.database");
        return new org.springframework.data.mongodb.core.SimpleMongoClientDatabaseFactory(com.mongodb.client.MongoClients.create(uri), database);
    }

    @org.springframework.context.annotation.Bean
    public org.springframework.data.mongodb.core.MongoTemplate mongoTemplate() throws Exception {
        return new org.springframework.data.mongodb.core.MongoTemplate(mongoClient(), env.getProperty("spring.data.mongodb.database"));
    }
}

File src/main/java/test.barry/models/Workflow.java

package test.barry.models;

@org.springframework.data.mongodb.core.mapping.Document(collection = "Workflow")
public class Workflow
{
    @org.springframework.data.annotation.Id
    public int id;

    public String name;
    public String status;
    public java.util.Date createdDate;
    public java.util.List<Task> tasks;

    public Workflow() {
        this.tasks = new java.util.ArrayList<Task>();
    }

    public Workflow(String name, String status, java.util.Date createdDate) {
        this();
        this.name = name;
        this.status = status;
        this.createdDate = createdDate;
    }

    @Override
    public String toString() {
        return String.format("Workflow[id=%s, name='%s', status='%s', createdDate='%s']", id, name, status, createdDate);
    }
}

File src/main/java/test.barry/models/Workflow2.java

package test.barry.models;

@org.springframework.data.mongodb.core.mapping.Document(collection = "Workflow")
public class Workflow2
{
    @org.springframework.data.annotation.Id
    public int id;

    public String name;
    public String status;
    public java.util.Date createdDate;
    public Task tasks;

    public Workflow2() {
        this.tasks = new Task();
    }

    public Workflow2(String name, String status, java.util.Date createdDate) {
        this();
        this.name = name;
        this.status = status;
        this.createdDate = createdDate;
    }

    @Override
    public String toString() {
        return String.format("Workflow[id=%s, name='%s', status='%s', createdDate='%s']", id, name, status, createdDate);
    }
}

File src/main/java/test.barry/models/Task.java

package test.barry.models;

public class Task
{
    public Task() {}

    public Task(String taskId, String value) {
        this.taskId = taskId;
        this.value = value;
    }

    public String taskId;
    public String value;
}

Conclusion

When using MongoShell we see the following records are created:

Enterprise replSet [primary] javaspringtestX> db.Workflow.find()
[
  {
    _id: 1,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
      { taskId: 'task1', value: 'new' },
      { taskId: 'task2', value: 'started' },
      { taskId: 'task3', value: 'completed' }
    ],
    _class: 'test.barry.models.Workflow'
  },
  {
    _id: 2,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
      { taskId: 'task1', value: 'new' },
      { taskId: 'task2', value: 'started' },
      { taskId: 'task3', value: 'completed' }
    ],
    _class: 'test.barry.models.Workflow'
  },
  {
    _id: 3,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
      { taskId: 'task1', value: 'new' },
      { taskId: 'task2', value: 'started' },
      { taskId: 'task3', value: 'completed' }
    ],
    _class: 'test.barry.models.Workflow'
  }
]

To view the results of the aggregation we must use the debugger. I am using the IntelliJ IDEA for debugging, and show the results in a list of type Workflow2. Not sure how to show them here. My testing has shown this to work as I understand it. Please evaluate and let me know if this needs tweaking...

By the way, the concept of pagination is best suited to be managed by your application and not the database. In practice you may find usages of skip() and limit() but for large datasets having many pages, you may find requeries for next pages cause performance issues as each time they must identify all the documents then identify which to skip. Better to keep track of the range shown on the prior page, then requery for only records on the next page. I.e., limit the resultset for better performance.

EDIT - 2021-12-09 Upon review of the saved data it shows weird dates. Apparently the deprecated use of java.util.Date myDate = java.util.Date(2021, 2, 10); creates invalid dates. For this reason I have added java.util.Calendar calendar = java.util.Calendar.getInstance();

CodePudding user response：

MongoDB aggregation is what you need:

db.Workflow.aggregate([
  {
    $match: {} // put here your search criteria
  },
  {
    $unwind: "$tasks"
  },
  {
    $addFields: {
      tasks: [
        "$tasks"
      ]
    }
  },
  //pageable
  {
    $skip: 0
  },
  {
    $limit: 100
  }
])

MongoPlayground

SpringBoot way:

@Autowired
private MongoTemplate mongoTemplate;

...

List<AggregationOperation> pipeline = new ArrayList<>();

//$match (put here your filter)
pipeline.add(Aggregation.match(Criteria.where("status").is("started")));

//$unwind
pipeline.add(Aggregation.unwind("tasks"));

//$addFields
pipeline.add(Aggregation.addFields().addFieldWithValue("tasks", Arrays.asList("$tasks")).build());

//$skip
pipeline.add(Aggregation.skip(0L));
    
//$limit
pipeline.add(Aggregation.limit(100L));

Aggregation agg = Aggregation.newAggregation(pipeline)
    .withOptions(Aggregation
        .newAggregationOptions().allowDiskUse(Boolean.TRUE).build());

return mongoTemplate.aggregate(agg, Workflow.class, Workflow.class).getMappedResults();