Metadata Management

"Metadata Management" includes all the processes to collect, capture, maintain metadata and to provide means that metadata can be shared by "metadata users" (which may be people or processes). It is the responsibility of metadata management to set up the necessary infrastructure and tools to maintain metadata in the required quality in a secure and efficient manner.

WHAT are "Metadata"?

The well-known definition of metadata "being data about data" extended:
Metadata are data, which describe the properties of data and the relationships between these data. This may still not be very precise but it is a good start. It introduces the concept of levels of abstraction: Since metadata are also data – there must be metadata (metametadata?) about these (meta) data.

This recursive definition seems to lead to an endless loop. OMG (the Object Management Group) uses metamodeling to define UML (Unified Modeling Language). OMG introduced the concept of a four-level (meta) model structure and resolved the endless loop problem. This structure distingushes four levels (M0 to M3) where each level has a well-defined purpose.

The basic idea of this model to define a language can also be applied to structure metadata:

  • the M0-level - the instances (whatever this is),
  • the M1-level - the metadata (describe the instances),
  • the M2-level - the models (describe the structure of the metadata) and
  • the M3-level is the top metamodel (which is the most abstract level with the basic building blocks – entities, attributes and relationships between entities).

WHY manage all these Metadata?

Metadata are required by IT-auditors to verify the compliance of the IT processes. The audit Framework COBIT (4.1) describes 34 core processes. The process PO2 – Define Information Architecture

  • ".. creates and regularly updates a business information model and defines the appropriate systems to optimise the use of this information.
  • This encompasses the development of a corporate data dictionary with the organisation’s data syntax rules, data classification scheme and security levels.
  • This process improves the quality of management decision making by making sure that reliable and secure information is provided, and it enables rationalising information systems resources to appropriately match business strategies.
  • This IT process is also needed to increase accountability for the integrity and security of data and to enhance the effectiveness and control of sharing information across applications and entities."

HOW manage all these Metadata?

Metadata are "fine grained, complex (metadata are highly connected by relationships) data of medium volume (in the range of millions of entities and relationships, i.e. more than just a few thousand and not hundreds of millions). They are in constant evolution (change). A technical platform is a prerequisite to successfully manage metadata and make metadata accessible to metadata users.

The minimum requirements for such a data platform are:

  • Medium number (Millions) of fine grained entities
  • Medium number (2xMillions) of relationships
  • Substantial number (thousand +)of types (entities, relationships, attributes)
  • Multi-user access (transactions protected)
  • Open architecture of the database
  • Multi-layer structure M0 to M3
  • Create and maintain submodels on M2-level
  • Flexibility to extend and change the structure
  • Versioning to support planning and graceful migration of releases
  • Query facility to analyze the data base and to create reports / export data
  • Create reports and graphics from the data
  • Access to the database via an API

It is fairly obvious that these requirements cannot be fulfilled with UML-text or pictures. It is neither a solution to flatten the complex structures into the tables of a relational database. XML-files are useful for data exchange but not as a storage platform.

Metasafe – an entity-relationship database – is the platform to build a metadata management infrastructure in an easier and more efficient manner. -> ITMap – documenting the landscape of IT.