Home > database >  A little problem in data warehouse, wish you a great god channel channel
A little problem in data warehouse, wish you a great god channel channel

Time:09-23

It is a description about incremental extraction of ETL process is:
A member list, for example, it is the primary key of the memberID, and members of the state is possible to update every day,
After we first extract, generated A standby list A, consists of two fields, the first is the memberID,
The second is all other fields besides memberID pieced together, then do a Hash generated field,
Next time extraction in the source table the same treatment, to generate tables, B to B and A left, Hash fields of unequal
For change of record, and some of the new record,
According to the records of the two parts memberID to extract the corresponding source table records,
According to the above description can be divided into new combination, data update, and then to update the ODS layer data, so if one of the data to be deleted in the source system, 'how can the data be deleted in ODS, or ODS data in a layer need not be deleted,

CodePudding user response:

Plug in, if the source member in the table 1 be deleted, should find this member ID 1 and then to the target table is removed, it should have a delete group, but to see a lot of related articles are divided into the update and new group, I hope you solve doubt,

CodePudding user response:

ODS layer data is generally is the latest data or full amount can also, if you remove a data source system, don't need you to delete your ODS layer handler should be handled automatically,

CodePudding user response:

In the first place: why do you want to delete the data? Even cancel membership should be modified status field, data warehouse users portrait is an important function in digging the potential customers, and you directly to the user deleted, how can the mining?

CodePudding user response:

This problem belongs to the source system is not standard, belongs to the scope of data governance
Is the specification of firms to delete data using a soft delete or recycle table handling

If the data warehouse to find this kind of situation, can only go through full table primary key than cyclical manner

CodePudding user response:

If first to extract A of the table have memberID='001', as you say the source system which data is deleted, then the second extraction is no which data in table B, A, B table make A comparison in order to find the need to delete the data, has commented that don't need to delete the data, this part of the data was wrong, wrong data how to provide effective value?

CodePudding user response:

If there is a physical delete the source, extraction based on log way to embody the advantages, catch the log according to their own needs to handle the increment table, and then, and it is disposed of in accordance with the logic of the update and new open logging database, deleting records must be log,

CodePudding user response:

Secretary table if it is physically deleted, good record delete id, redo ods data update, the records deleted again, generally do not do physically deleted in principle, we are doing logic delete, a delete tags, which does not exist the ods are inconsistent with the business system,

CodePudding user response:

What kind of tools, see the specific use of language or technology, look at the way you describe what do you want to be, is the CDC incremental data loading, if the upstream system has a logic to delete logo, you can plug in the logo in the middle of the table, whether it is a hash code or md5 can be, the purpose is to record the current value of the data and introduced to delete logo can identify the source system logic to delete,

CodePudding user response:

Your question is actually two:
1, how to capture data deleted? Usually several variety of ways: 1) the data log, 2) the trigger (poorly), 3) data source Auditing (seems to be no), 4) target table scan (performance)
2, don't need to delete the ODS need? You can but it is best not to do so,

CodePudding user response:

1. The data source table delete data is add a normal field, a deletion flag, and then record modification timestamp,
2. Is there really delete case, it is suggested that build a CDC table is used to record the operation, read the CDC deleted records in the table to update the ODS table, (SQL Server)
  • Related