What is Metadata?
Metadata , also known as intermediary data and relay data , is data about data. It is mainly information describing the properties of data. It is used to support instructions such as indicating storage locations, historical data, and resource search. , File recording and other functions. Metadata is an electronic catalog. In order to achieve the purpose of cataloging, the content or characteristics of the data must be described and stored, and then the purpose of assisting data retrieval is achieved. The Dublin Core Metadata Initiative (DCMI) is an application of metadata. It was developed by the International Library Computer Center (OCLC) and the National Center for Supercomputing Applications (NCSA) in February 1995. The co-sponsored seminar invited 52 librarians and computer experts to jointly develop specifications and create a set of features describing electronic files on the Internet.
- Chinese name
- Metadata
- Foreign name
- Metadata
- Explanation
- Data about data
- Pinyin
- yuánshùjù
- Advantages
- Self description, design
- Purpose
- Identify, evaluate, and track resources for effective management
- Nature
- Information describing data attributes
- Metadata , also known as intermediary data and relay data , is data about data. It is mainly information describing the properties of data. It is used to support instructions such as indicating storage locations, historical data, and resource search. , File recording and other functions. Metadata is an electronic catalog. In order to achieve the purpose of cataloging, the content or characteristics of the data must be described and stored, and then the purpose of assisting data retrieval is achieved. The Dublin Core Metadata Initiative (DCMI) is an application of metadata. It was developed by the International Library Computer Center (OCLC) and the National Center for Supercomputing Applications (NCSA) in February 1995. The co-sponsored seminar invited 52 librarians and computer experts to jointly develop specifications and create a set of features describing electronic files on the Internet.
- Metadata is information about the organization of data, data fields, and their relationships. In short, metadata is data about data. [1]
Metadata definition
- Metadata is defined as: data describing data, descriptive information about data and information resources.
- Metadata is data about other data, or structured data used to provide information about a resource. Metadata is data describing objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in the use of resources; implement simple and efficient management of large amounts of networked data; implement effective discovery of information resources Find, integrate and effectively manage resources. The basic characteristics of metadata are:
- a) Once metadata is established, it can be shared. The structure and integrity of metadata depends on the value and use environment of the information resources; the environment for the development and utilization of metadata is often a changing distributed environment; any one format cannot fully meet the different needs of different groups;
- b) Metadata is first and foremost a coding system. Metadata is a coding system used to describe digital information resources, especially network information resources. This has led to a fundamental difference between metadata and traditional data coding systems. The most important feature and function of metadata is to establish a type of digital information resources. Machine understandable framework.
- The metadata system constructs the logical framework and basic model of e-government, which determines the functional characteristics, operation mode and overall performance of the system. The operations of e-government are based on metadata. Its main functions are: description function, integration function, control function and agent function.
- Because metadata is also data, it can be stored and retrieved in a database in a similar way to data. If the organization providing the data element also provides metadata describing the data element, the use of the data element will become accurate and efficient. When using data, users can first look at their metadata so they can get the information they need.
Metadata data warehouse
- In the field of data warehouses, metadata is divided into technical metadata and business metadata by usage. First, metadata can provide user-based information, such as metadata describing business description information of recorded data items, which can help users use the data. Second, metadata can support the management and maintenance of data by the system. For example, metadata about the storage method of data items can support the system to access the data in the most efficient way. Specifically, in the data warehouse system, the metadata mechanism mainly supports the following five types of system management functions:
- (1) Describe which data is in the data warehouse;
- (2) Define the data to be entered into the data warehouse and the data generated from the data warehouse;
- (3) Record the time schedule for data extraction according to the occurrence of business events;
- (4) Record and test the system data consistency requirements and implementation;
- (5) Measuring data quality.
Metadata Software Construction
- Definition of Software Construction Field In the field of software construction, metadata is defined as data that is not an object to be processed in a program, but changes its behavior by changing its value. It plays a role in controlling the behavior of the program in an interpretive way during operation. By arranging different values of metadata in different positions of the program, you can get the equivalent program behavior.
Metadata book information
- In the library and information world, metadata is defined as: providing information about
- Metadata and Library Books
- Generally speaking, the so-called metadata is data about data, or structured data about data. From the existing conclusions, the meaning of metadata is gradually developing. The term metadata mainly refers to the description data of network resources for the organization of network information resources. After that, it gradually expanded to the description data of various information resources in electronic form. The term metadata is actually used for descriptive records of various types of information resources.
- In addition, metadata has its corresponding definitions and applications in geography, life sciences and other fields.
Metadata Features
- Metadata is structured data about data. It is not necessarily in digital form and can come from different sources. [1]
- Metadata is data related to objects. This data makes it unnecessary for potential users to have a complete understanding of the existence and characteristics of these objects. [1]
- Metadata is a description of the encoding of the information package. [1]
- Metadata contains a set of data elements used to describe the content and location of information objects, which facilitates the discovery and retrieval of information objects in the network environment. [1]
- Metadata not only describe information objects, but also describe the use environment, management, processing, preservation, and use of resources. [1]
- Naturally increase metadata in the life cycle of information objects or systems. [1]
- "Data" in the conventional definition of metadata is a symbol representing the nature of a transaction. It is a numerical value on which various statistics, calculations, scientific research, and technical design are based, or digital, formulaic, coded, and graphical information. [1]
Metadata advantages
- Metadata is key to a simpler programming model that no longer requires interface definition language (IDL) files, header files, or any external component reference methods. Metadata allows the .NET language to describe itself in a non-linguistic way, which is invisible to developers and users. In addition, metadata can be extended by using attributes. Metadata has the following main advantages:
Self-describing metadata
- The common language runtime modules and assemblies are self-describing. A module's metadata contains all the information needed to interact with another module. Metadata automatically provides the functionality of IDL in COM, allowing one file to be used for both definition and implementation. Runtime modules and assemblies do not even need to be registered with the operating system. As a result, the instructions used by the runtime always reflect the actual code in the compiled file, increasing the reliability of the application.
Metadata design
- Metadata provides all the necessary information about compiled code for you to inherit classes from PE files written in different languages. You can create instances of any class written in any managed language (any language that targets the common language runtime) without worrying about explicit marshaling or using custom interoperable code.
Metadata attributes
- The .NET Framework allows you to declare specific kinds of metadata (called attributes) in compiled files. Properties are found throughout the .NET Framework, and they are used to more precisely control how your program works at runtime. In addition, you can emit your own custom metadata to .NET Framework files through user-defined custom properties. For more information, see Extending Metadata with Attributes.
Metadata meaning
- Speaking of the meaning of metadata, we can talk about its application purpose. Although a data warehouse must be called metadata and technical and business metadata, where is it used? Leaving the goal to talk about metadata, I found that metadata contains too much, because it is the data that describes the data.
- Take the customer relationship system as an analogy. Of course, this system maintains customer information for a purpose. It is necessary to use this information for some automatic process processing, to tap some potential value of customers, and to do good customer service. Of course, there is no need to maintain customer life feature information, such as fingerprints, criminal history, etc., this information has little to do with the goal of customer relationship management. The same is true for metadata. You can change the structure and size of all data, when it is created, when it dies, used by those people, etc. This information can be extended too much. If you try to build a perfect one regardless of the goal Metadata management system, which is an absolute "top-down" approach, will undoubtedly fail.
Metadata enumeration
- Based on the application, the metadata can be divided into the following types.
- Data structure: name, relationship, field, constraint, etc. of the data set;
- Data deployment: the physical location of the data set;
- Data flow: process dependencies (non-reference dependencies) between data sets, including rules for data sets to another data set;
- Quality Metrics: Measurable metrics on a dataset
- Measure logical relationship: Logical operation relationship between data set measures;
- ETL process: the order in which the process runs, parallel, serial;
- Data set snapshot: the distribution of data across all data sets at a point in time;
- Star schema metadata: fact tables, dimensions, attributes, hierarchies, etc .;
- Report semantic layer: the correspondence between the rules of the report indicators, the filter name and the business name;
- Data access logs: what data is accessed by whom and when
- Quality audit log: when and how metrics were audited, and the results;
- Data loading log: which data is loaded by whom and when
Metadata standard
- 1. Digital Library Resources Organization Framework
- 2. Metadata Development Application Framework
- Basic meaning of metadata Metadata (metadata) is "data about data";
- Metadata provides standardized and universal description methods and retrieval tools for various forms of digital information units and resource collections;
- Metadata provides integrated tools and bonds for distributed information systems (such as digital libraries) that are organically composed of multiple digital resources.
- Digital libraries that leave metadata will be a piece of sand and will not be able to provide effective retrieval and processing.
- 3. Metadata application environment
- 3.1 Purpose of Metadata
- (1) Discovery andentification is mainly dedicated to how to help people retrieve and confirm the resources needed. Data elements are often limited to simple information such as author, title, subject, location, etc. Dublin Core is its typical representative.
- (2) Cataloging, used for detailed and comprehensive description of data units. Data elements include content, carrier, location and acquisition methods, production and utilization methods, and even related data units. Many, MARC, GILS and FGDC / CSDGM are typical representatives of this type of metadata.
- (3) Resource Administration, which supports the storage and use management of resources. In addition to comprehensive bibliographic description information, data elements often include Rights / Privacy Management, Digital Signature, and resource evaluation. Information such as Seal of Approval / Rating, Access Management, Payment and Accounting, etc.
- (4) Preservation and Archiving, which supports long-term preservation of resources. In addition to describing and confirming resources, data elements often include detailed format information, production information, protection conditions, and conversion methods. ), Preservation responsibilities, etc.
- 3.2 Application of Metadata in Different Fields According to the data characteristics and application needs of different fields, many metadata formats have appeared in various fields since the 1990s.
- E.g:
- Network resources: Dublin Core, IAFA Template, CDF, Web Collections
- Literature: MARC (with 856 Field), Public Core
- Humanities: TEI Header
- Social Science Dataset: ICPSR SGML Codebook
- Museums and works of art: CIMI, CDWA, RLG REACH Element Set, VRA Core
- Government Information: GILS
- Geospatial information: FGDC / CSDGM
- Digital images: MOA2 metadata, CDL metadata, Open Archives Format, VRA Core, NISO / CLIR / RLG Technical Metadata for Images
- Archives and Resources Collection: EAD
- Technical Report: RFC 1807
- Continuous image: MPEG-7
- 3.3 Application of Metadata Format
- Metadata in different fields are at different stages of standardization:
- In terms of network resource description, after years of international efforts, Dublin Core has become a widely accepted and applied de facto standard;
- In terms of government information, due to the vigorous promotion of the US government and the implementation of relevant laws and standards, GILS has become a government information description standard and has been used to a considerable extent in several countries around the world. CSDGM;
- However, in some areas, due to the rapid development and change of technology, there are still multiple scheme competitions, typically digital image metadata, and many of the standards proposed are in the experimental and perfect stage.
- 3.4 Metadata Format "Standardization"
- Metadata development and application experience shows that it is difficult to have a unified Metadata format to meet the data description needs of all fields; even in the same field, different but interchangeable Metadata formats may be required for different purposes.
- At the same time, the unified centralized planning metadata format standard is not suitable for the Internet environment, which is not conducive to making full use of market mechanisms and various forces.
- But in the same field, we should strive for "standardization", and in different fields, we should properly solve the problem of interoperability between different formats.
- 4. Metadata Structure
- 4.1 General structure definition method A metadata format is defined by a multi-level structure:
- (1) Content Structure, which describes the constituent elements of the Metadata and its definition standards.
- (2) Syntax Structure, which defines the metadata structure and how to describe this structure.
- (3) Semantic Structure, which defines the specific description method of Metadata elements.
- 4.2 Content Structure
- The content structure defines the constituent elements of Metadata, which can include: descriptive elements, technical elements, administrative elements, structural elements (such as links to coding languages, namespaces, data units, etc.).
- These data elements are likely to be selected based on certain criteria, so the metadata content structure needs to be explained, such as the ISBD based on MARC records, the ISAD (G) referenced by EAD, and the ICPSR Data Preparation Manual based on ICPSR.
- 4.3 Syntactic Structure
- Syntactic structure defines the format structure and description methods, such as the division and organization of elements, the rules for selecting and using elements, the method of element description (such as the ISO / IEC 11179 standard used by Dublin Core), the method of element structure description (such as the MARC record structure, and the SGML structure , XML structure), structural statement description language (such as EBNF Notation), etc.
- Sometimes, the syntactic structure needs to indicate whether the metadata is bundled with the described data object, or exists as separate data but is linked to the data object in a certain form. It may also describe the way of linking with the definition standard, DTD structure, and namespace.
- 4.4 Semantic Structure Semantic structure defines the specific description methods of elements, such as standards, best practices, or custom descriptions (Instructions) used to describe the elements.
- Some metadata formats themselves define the semantic structure, while others specify the semantic structure by specific adopting units. For example, Dublin Core recommends that date elements use ISO 8601, resource types use Dublin Core Types, data formats can use MIME, and identification numbers use URLs or DOI or ISBN;
- Another example is that when OhioLink uses VRA Core, the theme elements use A & AT, TGM and TGN, and the name element uses ULAN.
- 5. Metadata coding language and production method
- 5.1 Metadata Encoding Language
- Metadata Encoding Languages refer to the specific grammar and semantic rules that define and describe metadata elements and structures, and are often referred to as definition description languages (DDL).
- In the early stage of metadata development, people often used custom recording languages (such as MARC) or database record structures (such as ROADS, etc.), but with the increase of metadata formats and interoperability requirements, people began to use some standardized DDL to describe Metadata, such as SGML and XML, of which XML has the most potential.
- 5.2 Metadata production method
- (1) Specially compiled modules (for example, MARC, GILS, FGDC, etc.)
- (2) Automatic compilation during data processing (for example, Dublin Core, etc.)
- (3) Automatically compiled during data physical processing (such as certain metadata parameters during digital image scanning)
- (4) Shared metadata (such as OCLC / CORC, IMESH
- 6. Metadata interoperability
- 6.1 Metadata Interoperability Issues
- Because there are often multiple metadata formats in different fields (or even the same field), when searching, resource description, and resource utilization between resource systems described in different metadata formats, there is a problem of metadata interoperability ( Interoperability):
- Interpretation and conversion of multiple different metadata formats and transparent retrieval between digital information resource systems described by multiple metadata formats.
- 6.2 Metadata format mapping
- The conversion of different metadata metadata formats using a specific conversion program is called Metadata Mapping / Crosswalking.
- A large number of conversion programs already exist for conversion between several popular metadata formats, such as
- Dublin Core and USMARC; Dublin Core and EAD
- Dublin Core and GILS; GILS and MARC TEI
- Header and MARC FGDC and MARC
- You can also use one intermediary format to convert multiple metadata formats under the same format framework. For example, the UNIverse project uses the GRS format to convert various MARC formats and other record formats. Format mapping conversion is accurate and conversion efficiency is high. However, the efficiency of this method in open environments facing multiple metadata formats coexisting is significantly limited.
- 6.3 Standard description framework
- Another way to solve metadata interoperability is to establish a standard resource description framework, and use this framework to describe all metadata formats. So long as a system can parse this standard description framework, it can interpret the corresponding metadata format. In fact, XML and RDF play similar roles from different perspectives.
- XML, through its standard DTD definition, allows all systems capable of interpreting XML statements to recognize the metadata format defined by XML_DTD, thereby solving the problem of interpretation of different formats.
- RDF defines a basic model consisting of three objects: Resources, Properties, and Statements. The relationship between Resources and Properties is similar to the ER model, and Statements describe this relationship in detail.
- RDF uses this abstract data model to establish a framework for defining and using metadata. Metadata elements can be viewed as attributes of the resources they describe.
- Furthermore, RDF defines a standard Schema, which specifies the mechanism for declaring resource types, declaring related attributes and their semantics, and methods for defining the relationships between attributes and other resources. In addition, RDF also provides a mechanism for using XML Namespace methods to call existing defined specifications.
- 6.4 Digital Object Method
- Establishing a digital object containing metadata and its transformation mechanism may solve the problem of metadata interoperability from another perspective.
- The Cornell / FEDORA project proposes a composite digital object composed of a Structural Kernel and a Disseminator Layer.
- The kernel can contain the content of the document in the form of a bitstream, metadata describing the document, and related data for access control of this document and metadata.
- Function dissemination layer. The primary function disseminator (PrimitiveDisseminator) supports service functions related to destructing kernel data types and reading kernel data. There can also be Content-Type Disseminators, which can embed metadata format conversion mechanisms .
- For example, the metadata of a MARC format is stored in the kernel of a digital object, and a content type propagator requesting the Dublin Core format and its conversion service is loaded at the functional propagation layer. When a digital object user requests to read metadata represented by Dublin Core, the corresponding content type propagator will request a digital object that stores Dublin Core and its conversion service program through the network, and then request the MARC form in the digital object Metadata is converted to Dublin Core and output to the user.
- 7. Some suggestions
- Track the development of metadata, actively participate in the formulation of metadata standards, accelerate the application of metadata, and pay attention to international standards.
- Accelerate the study of mechanisms that effectively use metadata for retrieval (including transparent retrieval of heterogeneous systems), correlation learning, and personalized processing.
- Accelerate the research on the ways and methods of organic integration of metadata with digital objects and digital resource systems.
- Advance research to use metadata for knowledge-based data organization and knowledge discovery.
Metadata management
- In the early stage of metadata management, corresponding metadata management software is usually used to extract various types of metadata that users are concerned about from the developed application system, and then manually add some annotations and management attributes. This mode is called basic metadata management. Due to the untimely acquisition of metadata, there is a risk of vacancies in order to reduce the workload, and there is a lack of support for application experience. Practical applications are not universal. In the new generation application system (AS2.0) [2] , business functions are usually implemented by the corresponding components in the form of human-computer interaction through assembly in the dialogue process of the artificial business context. In this process, not only the application software elements required for business applications are completed, but also metadata collection corresponding to the application software elements is completed. This model is called active metadata management. The following are the main functions of metadata management. The last two parts are the content of active metadata management.
Basic metadata management
- Meta-model management. Utilizing a visual user experience to achieve maintenance functions including adding, deleting, modifying, and publishing metamodels; and allowing users to intuitively understand the classification, statistics, usage, change tracking, and life of each metamodel Cycle management and more.
- Metadata management. Metadata management implements basic management functions for metadata. Maintenance functions such as adding, deleting, and modifying metadata; relationship maintenance functions such as establishing, deleting, and tracking relationships between metadata; providing metadata publishing process management to better manage and track the entire life cycle of metadata ; Metadata itself quality check, metadata query, metadata statistics, metadata usage analysis, metadata change, metadata version and lifecycle management functions.
- Metadata analysis. The metadata analysis function mainly implements basic analysis functions for metadata. Including blood analysis (blood analysis), impact analysis, entity correlation analysis, entity impact analysis, host topology analysis, index consistency analysis, etc.
Metadata capture
- Provide metadata support for various types of application software elements. In the corresponding tool software, use the human-computer interaction mode of the best user experience, and follow strict logical steps to uniformly and sequentially define data items, define forms, define ETL and processing rules, While defining physical tables, defining multidimensional models, defining display and result data sets, and other application software elements, the metadata collection interface of the metadata capture function collects corresponding metadata into the metadata management platform in a timely manner. Instead of the basic metadata management need to extract metadata after the fact, while the application software elements are generated, various metadata related to it are loaded and formed.
Metadata service
- After the metadata generated by various application software elements enters the metadata platform, metadata management can provide metadata services for tool software or components that require these metadata through the metadata service function. For example, the definition results of the various tool software described above are packaged into a corresponding standard protocol to form a solution (application script), which is provided to physical table creation tools, ETL tools, multi-dimensional model creation tools, and results display in other application environments Low-level tools, such as tools, to achieve application reuse and sharing. At the same time, the metadata service can also provide auxiliary metadata help information for business application functions, such as descriptions and prompts of processing results and indicators in business functions, and their blood relationship analysis, so that users can clearly and intuitively understand the source of data, Information about the process and algorithm.