Introducing Metadata

David Schlinker, Senior Data Architect

June 9, 2020

As the volume and complexity of data continue to rise dramatically, the need to organize and make sense of it becomes increasingly important.  That is why many organizations have implemented or are considering a Metadata Management practice to optimize the value of their data.  Metadata is data that provides information about other data – or simply, data about data.  Although metadata is not always understood, it’s important for an organization to understand the purpose of metadata and its value within the organization.  Organizations with a formal Metadata Management practice promote internal/external communication and enable the business to optimize decision-making.

What is Metadata?

We see and use metadata every day, and don’t even know it.  The nutritional information on your breakfast cereal box is metadata.  It tells us what’s inside the box and explains the nutritional value of what we’re eating.  The name of the album and artist for the song you are listening to on your headphones is metadata and identifies what song we’re listening to, where it comes from, and who sings it.  The date and location for a photograph on your smartphone is metadata and helps us pinpoint when and where the photo was taken.  The column headings for a table in your Excel spreadsheet is metadata and explains what is in each column.  If no headings were there to identify the data underneath them, we’d be lost in space and calling the spreadsheet creator asking him what’s in column D.

Metadata describes the characteristics of and context for data.  It answers the who, what, when, where, why, and how of a piece of information. A good Metadata Management system answers the 5 Ws and 1 How of all the data it describes – or at least the key metadata required to help us understand the meaning of data we’re dealing with.  It is critical for improving analytics and decision-making within an organization. 

In the context of other data management concepts, metadata is one of eleven knowledge areas in the Data Management Body of Knowledge (DMBOK) wheel: 

When people hear the term “metadata”, they often think of a business glossary (business definitions and rules) and a data dictionary (technical definitions and rules).  These are key components of a Metadata Management system but there are other equally important elements that should be considered:

1. Workflows. A key step in the implementation of a Metadata Management system is the development of business processes to support the creation and maintenance of definitions for business and technical terms. These processes can be developed using generic business process tools or through tools specifically designed for Metadata.

2. Data lineage. A proper Metadata Management system must document the flow of data through its lifecycle.  Lineage of data identifies the whohow and when it was created, updated, accessed, and deleted.  More importantly, it describes how the data has been transformed at various stages.  This transformation is critical for describing how a value has been derived, or the context of a piece of information.  Even though a Metadata Management system strives for consistency and repeatability, there are legitimate scenarios where the same data can result in two similar but different pieces of information, such as customers meaning different things to different people within the organization.  My favourite customers are “paying” customers.  Executives are often frustrated by this because they do not have the metadata that explains the differences.

3. Data Quality. A Data Quality system includes metrics such as “ratio of errors to data” or “transformation error rate”.  There are specific tools used to analyze and measure data quality, but the metrics and the results of the data quality analysis should be stored within a Metadata Management system.

4. Data Policy. A Metadata Management system should implicitly or explicitly support company policies such as Data Governance, Data Security and Privacy. These are topics for another article.

5. Data Stewardship and Data Committees. A key aspect of Data Governance is accountability. A Metadata Management system must directly support data stewardship roles and responsibilities for key data within an organization.  For example, for a given data element, it must explicitly identify the data steward accountable for the data.  Questions regarding the meaning or use of that data element should be directed to the data steward identified in the Metadata Management system.

 

Why Metadata Matters

There are four key benefits of a Metadata Management system:

1. Analytics and Reporting. The need to understand and leverage data continues to grow as more and more organizations strive to become ‘data-driven’. It is very difficult to achieve this goal without a Metadata Management system to formalize and standardize the definitions and lineage of your data.  Metadata not only makes Data Scientists more efficient, but it also provides consistency and repeatability across the entire organization.  The analysis performed by Financial Analysts is consistent with the analysis of the marketing campaign.  Executives and managers across divisions can feel more confident in the decisions they make.

2. Risk and Regulatory Compliance. Metadata directly supports appropriate controls around your data. Whether it’s an audit or a regulatory compliance issue, Metadata documents the definitions, rules, and lineage associated with your key data.  It assists security and risk personnel in defining policies and controls for your data assets.

3. Data Governance. As I discussed in my previous IT Architects article Introducing Data Governance, data governance is “formalizing behavior around the definition, creation, and usage of data to manage risk, improve quality and improve usability”. Metadata Management is a critical component of Data Governance.  In order to achieve your Data Governance objectives, you must have in place a system and business processes that formally describe the characteristics of your data and assign accountability for its meaning and usage.

4. Operational Efficiency. An effective Metadata Management system will help to reduce research time and allow analysts and data scientists to spend more time analyzing and less time collecting and compiling.  It will also help to identify redundant and inconsistent data, and significantly improve the efficiency and accuracy of impact analysis required for large system modernization projects.

The Challenge

The biggest challenge in Metadata Management is organizational change management (OCM).  This is particularly true in organizations where operations are siloed.  Individuals, departments, and divisions may be reluctant to share the definitions and rules they use to derive values in databases, reports, dashboards, etc.  This happens because they are fearful of their work being taken over by some other part of the organization or because the definition or rule may be questioned as “incorrect”.  These concerns are similar to those that arise when a department or division is audited by an internal or external body.  As with any major enterprise-wide initiative, executive support and promotion is key.  As part of the communication and training effort for a Metadata initiative, I collaborate with key executives to ensure that they understand the importance of metadata and to help them communicate the message down through the organization.

One of the first steps in establishing a Metadata practice is the definition of data domains and critical data elements (also known as “key business terms”).  While this may seem simple, it is often challenging to shift people’s way of thinking from “my organizational unit” or “my computer application” to an enterprise-wide, subject-oriented perspective.  When initiating a Metadata practice, I like to start off with an exercise where we imagine there are no divisions or departments in the company and there are no computers.  I ask, “What types of information do you need to run your business?”  It typically results in a list of subject areas and business terms that can be converted into data domains and critical data elements.

Another challenge is the identification of the “right” Data Domain Stewards.  While this is part of an overall Data Governance initiative, it is critical to the successful implementation of a Metadata practice.  A natural tendency is to assign this role to subject matter experts.  Subject expertise is important, but a good Data Domain Steward is someone who thinks strategically and has an enterprise view of the organization and its data.  They are good facilitators with an uncanny ability to resolve differences of opinion.

A final challenge deals with resource requirements.  While people at all levels of an organization usually support the concepts of metadata, the responsibility for managing it can be met with resistance.  Many will see the new responsibilities as additional and unnecessary work in an already business day.  To be sure, there is some additional work, but in his book Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success, Rob Seiner “focuses on formalizing what already exists and addressing opportunities for improvement”.  At the early stages of establishing a Metadata practice, we need to identify the things they already do in relation to data.  For example, when you publish a key report, who do the readers call when they have questions?  Who explains the underlying meaning and rules used to derive value in the report?  Where did the raw data come from?  How current is the report?  We need to uphold the integrity of data and assign Data Stewards to maintain its currency and optimize its value to the organization.

Mr. David Schlinker is a Senior Data Architect affiliated with IT Architects in Calgary, Alberta, and has worked in various industries, including Health Care, Energy, Mining, Utilities, Telecommunications, Transportation, Manufacturing, and Retail. IT Architects (www.itarchitects.ca) is an information consulting firm specializing in business process optimization, system evolution planning, and the deployment of leading-edge technologies. If you require further information, David can be reached at info@itarchitects.ca or 403-874-9927.