Assuming I've the following class
public class Book {
private String title;
private String year;
private String author;
}
I'd like to avoid using a database engine since it's living really shortly only. Yet to improve performance I'd like to answer questions like "retrieve books where author = "Foo" and year = "2022"" or just "year = 2022" efficiently (not O(n)).
When using a Map, I can specify only one key that must be fully known, e.g. year, author, or both. Then I could get a list from there. But I need to answer both questions. So I'd need a Map with Author -> Index, Year -> Index and check if an entry is in both.
Is there any other API? Like list.addIndexOn(Book::title)
?
I need to perform lookups on field (combinations) known at coding time.
Like books.getYear(2022).andAuthor("X")
and books.getYear(2022)
and books.getAuthor("X")
.
So I can say in advance I'd need an index on year and author.
CodePudding user response:
If you really need to support just those two types of queries, you can use a Map<String, Map<String, Set<Book>>>
where the first key is the year and the second is the author. That's the equivalent of a Guava Table
.
This would look like this (incomplete code):
Map<String, Map<String, Set<Book>>> booksByYearAndAuthor;
// books where author = "Foo" and year = "2022"
booksByYearAndAuthor.get("2022").get("author") // add necessary null checks or default value
// books where year = "2022"
booksByYearAndAuthor.get("2022").values().stream().flatMap(...).collect(...) // add necessary null checks or default value
If you need to support arbitrary queries... I don't think you can get away from linear queries without implementing something yourself to add secondary indices, at the cost of extra memory. Do you have enough items to require doing this? That's the question, and if you don't need a DB I would say: probably not.
CodePudding user response:
Doing it yourself is quite hard, and puts you back in time in COBOL and navigational databases. Before investigating much time and effort in such a side-track, the following alternatives.
1. No indexing
Say you have 2000 books and 4 fields.
- Reading probably takes some time.
- Filling indexes takes long too, and you might need not 4 indexes, but compound indexes. In fact you do not know in advance which indexes need to be used.
So if you store the indexes with the table data, the reading takes time.
In fact it might be advisable to forget about indexes (like HashMap).
One very expressive form would be the Stream<Book>
List<Book> books = ...
books.parallelStream()
.filter(b -> b.getYear() == 2022)
.filter(b -> b.getAuthor().equalsIgnoreCase("X"))
.sort(Comparator.comparing(Book::getTitle)
.forEach(b -> System.out.println(b));
2. An embedded database
An H2 database is really simple to create, and has several advantages, as versatile (SQL) queries and indexing. Also O/R mapping in the form of JPA, like with eclipseLink is easy. Then you have the (type-safe) criteria API.