Home >> Database Design>> Common Lookup Tables

Common Lookup Tables

Introduction

In many typical SQL shops, data oriented business applications are sometimes developed with less emphasis on data integrity for a variety of reasons. Database developers often wear several hats to fit roles starting from system analyst to data modeler/database designer. In the midst of satisfying the client's wish lists, they are often tempted to use several kludgy design schemes. One such scheme among the most prevalent kludges is a common lookup table.

What is it?

Within the commonly accepted definitions in the computer field, a lookup table is simply an array-like data structure used to replace a runtime computation with a simpler lookup operation. In the arena of data management, the concept is somewhat similar.

Consider a typical business system where orders are placed by several customers. Naturally, several entities like customers, orders, statuses, locations etc. are involved in this business model. Let us assume at a given point in time a customer has a certain status, his order has certain priority level, the location where an item is shipped has a certain code, the freight selected has a discount type, nature of the warehouse has a certain classification code and so forth. For simplicity, let the logical schema supporting these entities is represented as:
            OrderStatus ( status_id, status_description )
            status_id       status_description
            ----------------------------------
            1               Ordered
            2               Cancelled
            3               Shipped
                       
            AssetPriority ( priority_code, priority_description )
            priority_code   priority_description
            ------------------------------------
            0               Unspecified
            1               Over 5 Million
            3               3 - 5 Million
            4               1 - 3 Million
            5               500,000 - 1 Million
            6               Less than 500,000
                                       
            DiscountType ( discount_code, discount_details )
            discount_code   discount_details
            --------------------------------
            1               Half Quota
            2               One or more references

            LocationCode ( location_nbr, location_details )
            location_nbr    location_details
            --------------------------------
            1               Northeast region
            2               Western states excluding CA
            3               Southern states and West Indies

            WarehouseType ( warehouse_type, warehouse_description ) 
            warehouse_type  warehouse_description
            -------------------------------------
            1               Central Headquarters
            2               Colocation 
            3               Inventory shipping
These entity types are often referenced by several other entity types and therefore the resulting schema might have several foreign key constraints referencing these tables. When the complexity of the business model increases, the number of such entities increases as well. And sometimes, this increase might give a certain unpleasant sense of clutter especially to those familiar with programming background. Some folks address this "clutter" by advancing approaches that reduce the number of tables. And the solution they often arrive at is a single lookup table!
            GenericLookup( category, code, description )
            category        code            description 
            -------------------------------------------
            1               1               Ordered
            1               2               Cancelled
            1               3               Shipped
            2               0               Unspecified
            2               1               Over 5 Million
            2               3               3 - 5 Million
            2               4               1 - 3 Million
            2               5               500,000 - 1 Million
            2               6               Less than 500,000
            3               1               Half Quota
            3               2               One or more references
            ...
These structures are often known as One-True-Lookup-Table. Some online resources have termed them as MUCK tables ( Massively Unified Code-Key tables ) or domain tables or code tables etc.

Superficial Benefits

These so called benefits are really long term nightmares. I am detailing them so that you can be familiar with the reasons why some are lured into using these designs.

  • Easy to use: Rather than understanding the underlying conceptual model and the tables that representing them, all the developers need to know is the actual category name or the type and they can include it as a parameter.
  • Quick and Simple: Which is simpler -- Lots of tables or just one column? To the untrained eye, simply adding a column makes it quick solution. After all most of the queries may look simpler with less number of tables to join, union, intersect etc.
  • Reducing data access code: Rather than using separate data access procedures, with such common lookup tables developers can now write a single all encompassing procedures passing an additional parameter. This boon multiplies when updates, inserts and deletes can all be bundled up in a single procedure.
  • Agile and flexible: Frankly I don't even know what the exact meaning of the term agile in the context of data management. However I have seen many using it to mean some property of a certain schema to be extended without breaking existing constraints. Essentially using a common lookup table seems to allow addition any number of categories without altering the schema or adding additional tables. This seems to be one of the most compelling temptations for a developer to opt for such a design solution.
Drawbacks and shortcomings

Now that we have gone through the "benefits", let us look at the common lookup table approach from the perspective of data management and isolate the problems. It is quite possible that some the problems may not appear to be real problems unless we really dig through the fundamentals of relational database design. Also in general current business environments typically favor high performance over data integrity and therefore there could be tendency to ignore the problems altogether.

  • Lack of Referential Integrity:
    Referential Integrity refers to the ability of a DBMS to enforce foreign key constraints for insert, update and delete operations with appropriate restrict and cascading options. They ensure consistent references among tables emphasizing the kinds of relationships that exist among applicable entities. A common lookup table cannot ensure this consistency and this is a critical drawback of such approaches.

    While a direct foreign key constraint can be defined, it is essentially meaningless since it offers no real referential integrity. Sometimes it is suggested that one can have a composite foreign key constraint over multiple columns, case in point over category and code columns. The problem here is with reusing the composite alternate key ( category, code ) in all other tables. And the category column may obviously lead to unwanted dependencies in the referencing tables. Alternatively the integrity enforcement will have to be done external to the DBMS which is a sure source of disaster.

  • Ignoring types:
    In a properly implemented DBMS, a rich type system that can represent any value of arbitrary complexity is an essential component. However, with the commercial systems supported by SQL cannot boast of such a luxury. Therefore intermingling of different types becomes a source of problem with common lookup tables. In the example stated above, if the discount code is CHAR(3) and location_nbr is INT(4), what is the type of code column in GenericLookup table?

  • Mediocre constraints:
    This is a significant drawback of using common SQL lookup table since we have no way of restricting the values that can be shoved into this table. Limiting the size and range of values becomes a nightmare given that there is no way to predict a safe, reasonable maximum size. For example, priority_description could be CHAR(10) while warehouse_description could be NVARCHAR(800). What would be an appropriate CHECK() for such or more complex discrepancies?

  • Limited expressiveness:
    This is again an extension of the problems described above. Every query or data manipulation statement would have to include the category value while referring to this table. A composite key would become a necessity to prevent redundancy. And since SQL allows NULLs in foreign keys the logic in the references by itself breaks down. When the user inserts a referencing row with a value for category and NULL for code, should the DBMS allow the insert? Since logically DBMS cannot preserve the integrity of the relationship should it abort the insert? Or should it consider the validity of the category value alone and proceed with the insert? The same issues exist while updating the referencing values as well.

  • Rigidity:
    How can such a seemingly simple and "flexible" design be rigid? Well considering the above common lookup table scheme, just imagine the LocationCode table to include another column say, region. Or say adding a status to the DiscountType table? Just to change a single category you'll have to consider make way for all the rows in the table regardless of the new column is applicable to them or not. Again developers tend to make such columns nullable and "logic" using nulls is a topic for another day.

  • Hidden Complexity:
    This is something that is not very apparent. Often the idea of using common lookup tables come from the idea of generalizing entities where by a single table represents a "thing" -- pretty much anything. Contrast this with the fundamental that a well designed table represents a set of facts about entities or relationships of the same kind. The problem with generalizing entities is that a table becomes a pile of unrelated rows. Consequently it results in the loss of meaning, leading to confusion and often unwanted complexity.

    Another suggestion is to use multiple tables with a view with a UNION ALL query. This, in some cases, can be beneficial providing logical data independence and appropriate security, however view update requirements vary from product to product. Moreover one can argue the same for a common lookup table using multiple views for each category. This again will have to be weighed against the arguments against the lack of constraints and unnecessary complexity that arises from managing them just for the sake of doing it.

    The main goal of a DBMS is to enforce the rules that govern how the data is represented and manipulated. Make sure you do not confuse the terms "generalize", "reuse" etc. in the context of database design to the extent where you have no control over what is being designed.

  • Physical Implementation Issues:
    While logical design is a totally separate concern from physical implementation, commercial DBMS products supported by SQL lack sufficient insulation between them. As a result drawbacks in physical implementations are often attributed to logical design and vice-versa. In large enterprises, such common lookup tables can become hundreds of thousands of rows leading to the need for heavy physical database tuning. Locking and concurrency issues with such large tables will have to be controlled. The internal representation of a particular set of row in physical storage can be a determining factor in how efficient the values can be accessed and manipulated by SQL queries.
Recommendations

Now that you have some idea about the drawbacks of using common lookup tables here is a brief list of general recommendations that you can use in your database design projects.
  • First and foremost, avoid generic lookup approaches as a rule. Use separate tables identifying appropriate columns with proper types, constraints and references.
  • Write distinct procedures to access and manipulate data in the tables
  • View approaches should be avoided since view expressions in SQL limit constraints, violate closure and have drawbacks with updating them.
Conclusion

Whether it is used as a short term make shift solution or as a long term viable solution, common lookup tables have no place in sensible database design. While application enforced integrity is often touted by many in the developer crowd, the fact that DBMS is the centralized enforcer of all integrity constraints still stands. In that regard, considering preserving data integrity and logical correctness as the foremost goal in a given database design, common lookup tables are one of the worst kind of mistakes that one can make in a stable data management solution.

Acknowledgements

Jake Christian