Friday, August 22, 2008

A comparative analysis between SQL*LOADER and UTL_FILE utility

In implementing new systems we come across problems of importing "alien" data. This may be coming from a legacy system or an on-going system. This data is transported via extract files from the legacy system to the Oracle system. The gateway to Oracle for this data is SQL*Loader and data is loaded into tables via a control script into tables.

Typically, the older systems do not have very normalized data, nor have they been operating with fully implemented database constraints. The lack of constraints over the years in legacy system can lead to bad data that has crept in. Therefore, while bringing external data into oracle system we need a refined set of checks and balances to ensure that we get good data. This requires a lot of programmatic control in the process of data-loading.

The approach applied in case of SQL* Loader is as follows :
1. Load the data into temporary tables via SQL*Loader via control file and make the data native to ORACLE.
2. Write a PL/SQL program to do the processing.
3. Load the data into live tables.
This approach has a lot of dependencies as well as a strong lack of integration of steps and programmatic control. To overcome this, we have analyzed another facility in that has been release Oracle 7.3.x onwards. It is called the UTL_FILE package. With some creative use of this package we can achieve whatever SQL*LOADER offers and in addition to that do some high level validation and complex data loading. In the following discussion a study of two tools is done.

A BRIEF OVERVIEW OF SQL*Loader:
SQL*Loader is a server utility for loading data from external data files into Oracle database. The basic advantage of using SQL*Loader is for simple loads and fast loading of data. It can load data into myriad data formats, perform elementary filtering, load data into multiple tables, and create one logical record from one or more physical records.

It creates a detailed log file, a bad file that contains rejected records and a discard file to hold the records that are selectively not loaded. The tool is executed from a command line and a username and password and the control file name and location are required to run it.

A BRIEF OVERVIEW OF UTL_FILE:
PL/SQL does not have text file input output capabilities but acquires it via UTL_FILE package. It provides rudimentary utility for reading ( as well as writing) files from within a PL/SQL program. The lines in the file are read sequentially and hence it effects the performance of the program.

The UTL_FILE package can be wrapped around with a PL/SQL program and since this package is integrated with PL/SQL it provides us the tremendous ability for flexing our "programming muscles." Some procedures and functions can be added to this wrapper program that serve as a handy "tool" for doing normal file reading operations. With this approach we can achieve whatever SQL*Loader can do and much more. The security mechanism for UTL_FILE is achieved by defining a parameter in INIT.ora file called utl_file_dir parameter. The directories that UTL_FILE can read from and write to need to have permissions of Oracle instance owner and the user running the package.

CONCLUSIONS:
The comparative analysis of SQL*Loader and UTL_FILE reveals that these tools are suitable to our environment subject to the conditions of our needs.

If the data load is complex (as is the case in relational databases) UTL_FILE seems to be the tool of choice. This tool does require programmatic effort in terms of writing a wrapper package but the subsequent effort in this direction is greatly reduced once the initial tool kit is built for your environment.

The UTL_FILE tightly integrates the file input with the programmatic control and the data manipulation inside a single PL/SQL unit. There are disadvantages of speed in loading in case of UTL_FILE but these are totally offset by the programmatic control it offers and the integration it brings in.

Thus we find that UTL_FILE tool bridges the gap left by SQL*Loader for complex data loads.

No comments: