1. Introduction
It took a bit of time, but I needed some time to get back into Oracle and the datawarehouse stuff. What I tried to implement was a possibility, to check data validity during datawarehouse-load in a generic and easy way and in particular field wise.
2. Datawarehouse Setup
What I often found in datawarehouse-projects is an at least two-tier implementation where the "raw" data is put into an import-table via SQLLoader. I know, that in reality the N in n-tier could be quite a lot, but let's assume, we just have a 2-tier database-load. These import-table just consist of VARCHAR2, so there is no validation done during import.
In a second step these data is transformed (e.g. the numbers are transformed into NUMBER fields) into a second table, the target table. Let's assume, the fields in the target table are named equally to the import table.
3. Finding Source and Target Fields
So first of all I want to determine all source fields that should be stored into the corresponding target field. I use the meta table all_tab_columns to achieve that.
4. Creating transformation statements
Now I use the information to create a select statament for the source table and an insert statement for the target table.
The v_insert string holds all the matching fields seperated by a comma. The v_bind holds all variables (the a colon combine with the field_name, e.g. :DATE_FIELD).
These variables are used, to build the queries:
5. Dynamic execution
In this step I wanted to execute them dynamically.
What we got, is a simple way for processing each column seperately. Now we can get a result for each columns transformation. Imagine the line where I bound the value to the insert statement. We could exchange that with a datatype-dependant field processing:
The above example shows the transformation for NUMBER and DATE fields. If an error occurs, it is logged field wise in the v_error variable.
6. Conclusion
This is a easy way to process an ETL with a field-wise type cast. You get an error for every field that doesn't match and not just one error per dataset.