Monday, March 8, 2010

Oracle field wise in-line data check

1. Introduction

It took a bit of time, but I needed some time to get back into Oracle and the datawarehouse stuff. What I tried to implement was a possibility, to check data validity during datawarehouse-load in a generic and easy way and in particular field wise.

2. Datawarehouse Setup

What I often found in datawarehouse-projects is an at least two-tier implementation where the "raw" data is put into an import-table via SQLLoader. I know, that in reality the N in n-tier could be quite a lot, but let's assume, we just have a 2-tier database-load. These import-table just consist of VARCHAR2, so there is no validation done during import.

In a second step these data is transformed (e.g. the numbers are transformed into NUMBER fields) into a second table, the target table. Let's assume, the fields in the target table are named equally to the import table.

3. Finding Source and Target Fields

So first of all I want to determine all source fields that should be stored into the corresponding target field. I use the meta table all_tab_columns to achieve that.

4. Creating transformation statements

Now I use the information to create a select statament for the source table and an insert statement for the target table.

The v_insert string holds all the matching fields seperated by a comma. The v_bind holds all variables (the a colon combine with the field_name, e.g. :DATE_FIELD).

These variables are used, to build the queries:

5. Dynamic execution

In this step I wanted to execute them dynamically.

What we got, is a simple way for processing each column seperately. Now we can get a result for each columns transformation. Imagine the line where I bound the value to the insert statement. We could exchange that with a datatype-dependant field processing:

The above example shows the transformation for NUMBER and DATE fields. If an error occurs, it is logged field wise in the v_error variable.

6. Conclusion

This is a easy way to process an ETL with a field-wise type cast. You get an error for every field that doesn't match and not just one error per dataset.

No comments:

Post a Comment