Abstracting Data Access Layer with Entity Framework code first-CodePudding

I have two separate databases. Database A is small and contains 6 tables and some stored procedures, database B is larger and contains 52 tables, 159 views, and some stored procedures. I have been tasked with creating an application that does operations on both of these databases. There is already code that has been written that accesses these databases but it is not testable and I would like to create the DAL code from scratch. So I made a code first Entity Framework connection and let it create all of the POCOs for me as well as the Context classes. Now I have 3 DatabaseContext classes. 1 for database A which is just the 6 separate tables. 1 for the tables from database B and 1 for the views from database B. I split the Context class for database B into 2 separate context classes because the tables and the views are used in 2 different scenarios.

I have created 2 projects in a single solution. DomainModel and DataAccessLayer. The DomainModel project holds all of the POCOs which the Entity Framework code first generated for me. DataAccessLayer is where the DatabaseContexts are and where I would like to create the code which other projects can use to access the database. My goal being to make a reusable and testable data access layer code.

The QUESTION is: What do I create in the DataAccessLayer project to make the DAL resusable in other projects that want to be able to talk to the database in an abstract way and what is the best way to structure it?

What I have done: In my search on the internet I have seen that people recommend the repository pattern (sometimes with UnitOfWork) and it seems like that is the way to go, so I created an interface called IGenericRepository and an implementation called GenericEntityFrameworkRepository. The interface has methods such as GetByID, Insert, Delete, and Update. The implementation takes a DbContext as a parameter in the constructor and implements all of the methods. The next step might be to create specific implementations for the different tables and views but that is going to be a very tedious thing to do since there are so many tables and views. I have also read that Entity Framework IS the DAL. Then it might be enough to just create another project that holds the business logic and returns IEnumerables. But that seems like it will be hard to test.

What I am trying to achieve is a well structured and testable foundation to access our database that can be tested thoroughly and expanded as other projects start to require other functionality to the database.

The folder structure for the project can be seen in the following picture:

CodePudding user response：

I guess you tumbled between selecting right architecture for a given solution. a software must be created for meet the business needs. so depending on the business, complexity of the domain, we have to choose the write architecture.

Microsoft elaborated this in a detail PDF you will find it here https://docs.microsoft.com/en-us/dotnet/architecture/modern-web-apps-azure/

You want some level of abstraction between your Data and the BL then "N-Layer" architecture is not the best solution. even though i am not telling the N-Layer entirely out dated but there is more suitable solutions for this problem .

And if you using EF or EF core then there is no need to implement Repository cus EF itself contain UOW and Repositories.

Clean Architecture introduce higher level of abstraction between layers. it abstract away your data layer and make it reusable over different project or different solutions. Clean architecture is a big topic i can't describe here in more details. but i learn the subject from him.

Jason Taylor- Clean Architecture

https://www.youtube.com/watch?v=dK4Yb6-LxAk

https://github.com/jasontaylordev/CleanArchitecture

CodePudding user response：

Design and architecture is all about trade-offs. Be careful when you want to consider abstracting your data access for purposes of reuse. Within an application this leads to overly complex and poorly performing solutions to what can be a very simple and fast solution.

For instance, if I want to build a business domain layer as a set of services to be consumed by a number of different applications (web applications and services/APIs that will serve mobile apps or third party interests) then that domain layer becomes a boundary that defines what information will come in, and what information will go out. Everything in and out of that abstraction should be DTOs that reflect the domain that I am making available via the actions I make available. Within that boundary, EF should be leveraged to perform the projection needed to produce the relevant DTOs, or update entities based on the specific details passed in. (After verification) The trade-off is that every consumer will need to make due with this common access point, so there is little wiggle room to customize what comes out unless you build it in. This is preferable to having everything just do their own thing accessing the database and running custom queries or calling Stored Procedures.

What I see most often though is that within a single solution, developers want to abstract the system not to "know" that the data domain is provided by EF. The application passes around POCO Entities or DTOs and they implement things like Generic Repositories with GetById, GetAll, Update, Upsert, etc. This is a very, very poor design decision because it means that your data domain cannot take advantage of things like projection, pagination, custom filtering, ordering, eager loading related data, etc. Abstracting the EF DbContext is certainly a valid objective for enabling unit testing, but I strongly recommend avoiding the performance pitfall of a Generic Repository pattern. IMO attempting to abstract away EF just for the sake of hiding it or making it swappable is either a poor performing and/or an overly complex mess.

Take for example a typical GetAll() method.

public IEnumerable<T> GetAll()
{
    return _context.Set<T>().ToList();
}

What happens if you only want orders from the last 3 months?

var orders = orderRepository.GetAll()
    .Where(x => x.OrderDate >= startDate).ToList();

The issue here is that GetAll() will always return all rows. If you have 4 million orders you can appreciate that is not desirable.

You could make an OrderRepository that extends Repository to implement a method like GetOrdersAfter(DateTime) but soon, what is the point of the Generic Repository? The next option would be to pass something like an Expression<Func<TEntity>> Where clause into the GetAll() method. There are plenty of examples of that out there. However, doing that is leaking EF-isms into the consuming code. The Where clause expression has to conform to EF rules. For instance passing order => order.SomeUnmappedProperty == someValue would cause the query to fail as EF won't be able to convert that down to SQL. Even if you're Ok with that the GetAll method will need to start looking like:

IEnumerable<TEntity> GetAll(Expression<Func<TEntity>> whereClause = null, 
    IEnumerable<OrderByClause> orderByClauses = null, 
    IEnumerable<string> includes = null, 
    int? pageNumber = null, int? pageSize = null )

or some similar monstrosity to handle filtering, ordering, eager loading, and pagination, and this doesn't even cover projection. For that you end up with additional methods like:

IEnumerable<TEntity, TDTO> GetAll(Expression<Func<TEntity>> whereClause = null, 
    IEnumerable<OrderByClause> orderByClauses = null, 
    IEnumerable<string> includes = null, 
    int? pageNumber = null, int? pageSize = null )

which will call the other GetAll then perform a Mapper.Map to return a desired DTO/ViewModel that is hopefully registered in a mapping dependency. Then you need to consider async vs. synchronous flavours of the methods. (or forcing all calls to be synchronous or async)

Overall my advice would be to avoid Generic Repository patterns and don't abstract away EF just for the sake of abstraction, instead look for ways to leverage the most you can out of it. Most of the problems developers run into with performance and odd behaviour with EF stem from efforts to abstract it. They are so worried about needing to abstract it to be able to replace it if they need to, that they end up with those performance and complexity problems entirely because of the limitations imposed by the abstraction. (A self-fulfilling prophecy) "Architecture" can be a dirty word when you try to prematurely optimize a solution to solve a problem you don't currently, and probably will never face. Always keep YAGNI and KISS at the forefront of all design considerations.