ETL (Extract, Transform, Load) processes can involve considerable complexity to design, build and deploy. And once deployed, operational problems often occur which require significant care and feeding of ETL systems. For years ETL tools have been sold to CIOs and IT leaders on the premise that they will simplify this complex data movement process by providing a graphical user interface and other utilities to make development easier. Traditional ETL tools also come with the myth that they are somehow self-documenting, because they provide a “picture” of the ETL mapping. The reality for ETL developers and administrators is something very different.
Your ETL tool is failing you.
In a never-ending quest to satisfy all manner of data movement scenarios, traditional ETL tools have become a vast wasteland of GUI complexity. They have become so complex in fact, that there is often more time and effort spent learning the tool then producing content. Companies selecting a traditional ETL tool find themselves investing hundreds if not thousands of person hours and consulting dollars learning complex proprietary user interfaces. This not only creates a vendor lock-in but also drives up license and consulting costs.
Why is your ETL tool failing you?
- They make the simple things hard
- They are not good at doing the complex things you need
One of the most common tasks in ETL development is the staging of data. This involves making a duplicate of source tables and/or files in the target database or data lake. When implemented as a full refresh of the data (replacing the entire contents of the target each run), this process should be among the easiest and quickest task in the ETL world. Instead, it can take an average developer up to 1 hour per table to design, build and deploy such a simple ETL mapping in a traditional tool. In an example project which has 900 tables to stage, costs can quickly add to fifty thousand dollars ($50,000) or more for this necessary but low value staging task.
Let’s consider this traditional ETL workflow for building a simple stage mapping:
- Import Table Structure from Source
- Create Target Table in Target System (typically hand coded DDL outside of the ETL tool)
- Import Table Structure for Target
- Drag Source and Targets on to the Palette
- Connect Columns or Verify Auto Mapping
- Provide Overrides for Datatype Mismatches
- Save Mapping
- Repeat 900 times…
For the cost of your ETL tool, this workflow should be automated for you.
Ask any veteran ETL developer and he or she will tell you that the dirty ETL secret. Traditional ETL tools do NOT generate, efficient, effective or accurate SQL in many complex situations. Hiding behind the layers of complex GUIs, traditional ETL tools are essentially SQL Code generators. Those same veteran ETL developers will tell you that every day they are creating database views, custom SQL source qualifiers, SQL overrides and other SQL hacks because their ETL tool does not consistently generate optimal SQL.
In the end:
The only thing that traditional ETL tools excel at is the middle ground mappings that are complex enough to not be easily automated, yet simple enough for the tool to generate optimal SQL. As an ETL developer, trying to guess at which mappings are too complex for the tool is a lost cause. Instead most veterans have developed methodologies (like database views or custom SQL source qualifiers) that take the guesswork out of the middle ground.
So, what is the answer?
The answer is SQL. Rather than invest in a traditional ETL tool with proprietary GUI lock-in and sky-high consulting costs, companies should be investing in SQL skills and talent. By investing in SQL skills, companies not only get stronger at ETL and data movement, but they also gain the freedom to move quickly and cost effectively to new vendor solutions. If you ask your veteran ETL developers you will probably find that the top talent among them are the best SQL developers. Sure, they work in a traditional ETL tool, but in the end, it’s just an expensive framework for their SQL.
Should you just hand code your ETL system? No, rather you should look for a modern data movement platform that:
- Automates staging mappings (copy of source to target)
- For full refresh or incremental loads.
- Auto generates and executes the target table DDL if no table exists
- Is able to parallel process the initial bulk loads
- Provide a graphical workflow engine for creating comprehensive parallel and serial load plans
- Provide a simple scheduling tool for mapping or workflows
- Allow custom SQL mappings for all moderate or complex mapping scenarios
- Create a scaling server framework to avoid being the Man-in-the-Middle server bottleneck
Interested in introducing such a modern data movement platform to your environment? Contact DataRoad Technologies to learn how our DataRoad Reflect can streamline your staging migrations from the string of repetitive tasks for each of 900 tables to these simple steps with zero repetitions required.
- Select Source Tables to Import into Staging
- Generate 900 Source Table Mappings by a Single Mouse Click
- Run 900 Source Table Mappings to Import Data by a Second Mouse Click
- Repeat Nothing!