What is the Semantic Web?

The Semantic Web is a vision for the future network. Now it is combined with the concept of Web 3.0 as one of the characteristics of the 3.0 network era. Simply put, the Semantic Web is an intelligent network that can not only understand words and concepts, but also the logical relationship between them, which can make communication more efficient and valuable.

The concept of the Semantic Web is
Similar to Web 2.0 with the AJAX concept as an opportunity, if Web 3.0 takes the Semantic Web concept as an opportunity, there will also be a technology similar to AJAX, which will become the network's standard, markup language, or related processing tools to expand The World Wide Web created the era of the Semantic Web. Enterprises with this technology will be the tide of the Internet age.
The Semantic Web is different from the current WWW. The existing WWW is oriented to documents and the Semantic Web is oriented to the data represented by the documents. The Semantic Web pays more attention to computer "understanding and processing" and has certain judgment and reasoning capabilities.
The realization of the Semantic Web meant that there would be a large number of intelligent individuals (programs) that depended on the Semantic Web at that time, and they widely existed in computers, communication tools, electrical appliances and other items. They combined to form a primary intelligent network that surrounds human existence.
The Semantic Web is an extension and extension of the WWW. It shows the bright future of the WWW and the Internet revolution that it brings, but the implementation of the Semantic Web still faces huge challenges:
  • Content availability, based on Ontology (
    How to understand and judge?
    The Semantic Web "is different from the existing World Wide Web, whose data is mainly used by humans. The next-generation WWW will provide data that can also be processed by computers, which will make a large number of intelligent services possible." Develop a series of languages and technologies for expressing semantic information that can be understood and processed by computers to support a wide range of effective automatic reasoning in the network environment. "
    The World Wide Web we currently use is actually a medium for storing and sharing images and text. All computers can see is
    Although the Semantic Web is a better network, it is a complicated and vast project to implement. Currently, the architecture of the Semantic Web is under construction, which mainly needs the following two aspects of support:
    First, the realization of the data network
    That is, the network information is marked more thoroughly and in detail through a set of unified and complete data standards, so that the semantic web can accurately identify the information and distinguish the role and meaning of the information.
    To make the Semantic Web search more accurate and thorough, and easier to judge the authenticity of the information, so as to achieve the practical goal, we first need to develop standards that allow users to add metadata to web content (that is, explain the detailed markup) and allow users to Point out exactly what they are looking for. Then, you need to find a way to ensure that different programs can share content from different websites. Finally, users are required to add other features, such as adding application software.
    The implementation of the Semantic Web is based on
    We know that most technological innovations and breakthroughs are recombinations and updates of existing knowledge. The semantic web, which has the ability to intelligently evaluate data stored in cyberspace, will inevitably provide endless resources for new technological innovations. Once this technology is widely used, its benefits are incalculable. Therefore, since its birth, the Semantic Web has become a hot area of computer research.
    The W3C organization is the main promoter and standard maker of the Semantic Web. Under its care, the technology of the Semantic Web is gaining momentum. On July 30, 2001, Stanford University held an academic conference entitled "Semantic Web Infrastructure and Applications", which was the first international conference on the Semantic Web. On July 9, 2002, the first International Semantic Web Conference was held in Italy. Since then, the Semantic Web Conference has been held once a year to form a convention. At the same time, educational institutions such as HP, IBM, Microsoft, Fujitsu, Stanford University, University of Maryland, Karlsruhe University in Germany, and Victoria University of Manchester in the United Kingdom have conducted extensive and in-depth research on semantic web technologies. A series of semantic web technology development and application platforms such as KAON, Racer, Pellet, information integration based on semantic web technology, and query, reasoning and ontology editing systems.
    Research Status of Semantic Web in China
    The architecture of the Semantic Web is under construction. At present, international research on this architecture has not yet formed a satisfactory and rigorous logical description and theoretical system. Chinese scholars have only done this architecture based on foreign studies. A brief introduction has not yet been formulated systematically.
    The implementation of the Semantic Web requires the support of three key technologies: XML, RDF, and Ontology.
    "Resource Description Framework "
    Various current World Wide Web technologies are likely to be applied to the Semantic Web (in the sense of the Semantic Web ), such as:
    • DOM Document Object Model, a set of standard interfaces for accessing XML and HTML document components.
    • XPath, XLink, XPointer
    • XIncludeXML fragmentXML query language XHTML
    • XML Schema, RDF (Resource Description Framework)
    • XSL, XSLTExtensible Stylesheet Language
    • SVG (Scalable Vector Graphic)
    • SMIL
    • SOAP
    • DTD
    • Microformat
    • Metadata concept.
    The Semantic Web is a high-level intelligent product of the Internet age. It is widely used and has a bright future. The main application technologies and research trends will be introduced below.
    Classic bottom-up and emerging top-down approaches. The bottom-up approach focuses on labeled information and uses RDF to represent it, so this information is machine-readable. Top-down focuses on the use of ready-made page information to automatically extract meaningful information from it. Each method has developed in recent years. A good news for the bottom-up approach comes from the Yahoo search engine's statement that it supports RDF and microformats. This is a three-win measure for content publishers, Yahoo, and consumers: Publishers have the incentive to label their information, Yahoo can use this information more effectively, and users can get better and more accurate results. Another good news comes from Dapper's statement on providing a semantic web service, which allows content publishers to add semantic annotations to existing web pages. It can be expected that the more such semantic tools, the easier it will be for publishers to label web pages. The development of automatic labeling tools and the increase in labeling incentives will make the bottom-up approach more compelling. Although tools and incentives are available, it is still quite difficult to make the bottom-up approach popular. In fact, today's Google technology can already understand those unstructured webpage information to a certain extent. Similarly, top-down semantic tools focus on how to deal with existing imperfect information. These methods mainly use natural language processing to extract entities. These methods include text analysis techniques that identify specific entities (with names, companies, places, etc.) in the document, and vertical search engines that can obtain information in specific fields.
    Top-down technology focuses on obtaining knowledge from unstructured information, but it can also process structured information. The more bottom-up labeling techniques, the more the performance of the top-down method can be improved. In the bottom-up labeling method, there are several candidate labeling techniques, they are all powerful, and their selection requires a trade-off between simplicity and completeness. The most complete method is RDF: a powerful graph-based language for representing things, attributes, and relationships between things. In simple terms, you can think of RDF as a language that expresses facts in such ways: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is known for being highly recursive, precise, and mathematical, it is also complex. At present, most RDF is used to solve data interoperability. For example, medical organizations use RDF to represent genomic databases. Because the information is standardized, the original isolated databases can be queried together and compared with each other. Generally speaking, in addition to the semantic meaning, the main benefit of RDF is to achieve interoperability and standardization, especially for enterprises (discussed below). Microfomats provides a simple method-CSS style-to add semantic markup to existing HTML documents, concise meta data is embedded into the original HTML document. The more popular Microformats tags include hCard: describing personal and company contact information; hReview: meta information added to a comment page; and hCalendar: a tag describing an event. Microformats has gained popularity for its simplicity, but its capabilities are still limited. For example, the description of a hierarchical structure that is considered necessary by the traditional semantic community cannot be done. In addition, in order to minimize the label set, it is inevitable that their meanings appear vague. This leads to another question: Is it appropriate to embed tags in HTML documents? However, although there are still many problems, Microformats is still popular because of its simplicity. Flickr, Eventful, LinkedIn and many other companies are adopting microformats, especially after Yahoo's search statement was released. A simpler method is to put the meta data in the meta header. This method has been used to some extent, but unfortunately it is not widely used. The New York Times recently launched a callout extension for their news pages, and the benefits of this approach have been apparent on those topic or event pages. For example, a news page can be identified by a set of keywords: place, date, time, person, and category. Another example is a book page. Book information has been added to the meta header of the page: author, ISBN, and book category. Although all of these methods are different, they all work well. The more web pages are tagged, the more standards will be implemented, and the information will become more powerful and accessible.
    In the discussion of the Semantic Web, the concerns of users and enterprises are different. From a consumer standpoint, we need a killer app that can deliver real and simple value to users. Because users only care about the usefulness of the product, and they don't care what technology it is built on. The problem is that until now, the focus of the Semantic Web has remained more theoretical, such as labeling information to make it machine-readable. We can give this promise: once the information is labeled, the network will become a large RDF database, and a large number of exciting applications will emerge. But some skeptics point out that you must first reach that assumption.
    There have been many applications based on the semantic web, such as general and vertical search engines, text assistant tools, personal information management systems, semantic browsing tools, etc., but they have a long way to go before they are accepted by the general public. Even if these technologies succeed, users will not be interested in knowing what technology is used behind that. So there is little prospect for promoting the Semantic Web technology at the user level.
    Enterprises are different. First, companies are more accustomed to technical arguments. For them, the use of semantic technology can increase the intelligence of products and form market value. "Our products are better and smarter because we use the Semantic Web." It sounds like a good publicity for businesses.
    At the enterprise level, RDF solves the problem of data interoperability standards. This problem actually appeared in the early days of the software industry. You can forget about the Semantic Web and just think of it as a standard protocol, a standard that allows two programs to communicate information. This is undoubtedly of great value to the enterprise. RDF provides an XML-based communication solution, and the prospects it describes make companies less concerned about its complexity. However, there is still a problem of scalability. Unlike relational databases that have been popularized and optimized, XML-based databases have not been popularized, which is attributed to their scalability and query capabilities. Just like object databases in the late 1990s, XML-based databases carry too many expectations, let us wait and see.
    Semantic APIs have evolved with the development of the Semantic Web. This type of web service takes unstructured text as input and outputs some entities and relationships. For example, Reuters' Open Calais API, this service accepts input of the original text, returns the name, location, company and other information in the text, and marks it in the original text. Another example is TextWise's Hacker API, which also offers a $ 1 million bounty to reward the best commercial Semantic Web applications based on its API. This API can divide the information in the document into different categories (called semantic fingerprints), and output entities and topics in the document. This is very similar to Calais, but it also provides a hierarchical structure of topics, and the actual objects in the document are leaf nodes in the structure. Another example comes from Dapper, a web service that helps extract structured information from unstructured HTML pages. Dapper's work relies on the user to define some attributes for the object on the page. For example, an image publisher will define where the information of the author, ISBN, and number of pages is, and then the Dapper application can create a recognizer for the site. API to read its information. From a technical perspective, this may seem like a step backward, but in fact Dapper's technology is very useful in practice. Take a typical scenario as an example, for a website that does not have a special API to read its information, even a person who does not know technology can use Dapper to construct an API in a short time. This is the most powerful and fastest way to turn a website into a web service.
    Perhaps the original motivation for the development of the Semantic Web was that it has been difficult to improve the quality of search for a long time. The hypothesis that an understanding of page semantics can improve search quality has also been proven. Semantic Web Search Two major competitors Hakia and PowerSet have made a lot of progress, but still not enough. Because the google algorithm based on statistics performs as well as semantic technology when dealing with entities such as people, cities, and companies. When you ask "who is the French president", it returns a good enough answer. More and more people realize that it is difficult to beat Google by improving the marginalization of search technology, so they turn to the killer application of the Semantic Web. It is likely that understanding semantics is helpful for search engines, but that is not enough to build a better search engine. Fully combining semantics, novel display methods and user identification can enhance the search experience of the next generation search engine. There are other methods that attempt to apply semantics to search results. Google is also trying to separate search results into different categories, and users can decide which categories they are interested in. Search is a race, and many semantic companies are chasing it. Perhaps there is another possibility to improve search quality: the combination of text processing technology and semantic database. We will talk about it next. We have seen more and more text processing tools enter the consumer market. Text navigation applications like Snap, Yahoo Shortcuts or SmartLinks can "understand" the objects in text and links and attach the corresponding information to them. The result is that users don't need to search at all to get an understanding of the information. Let's think a bit further, the way text tools use semantics can be more interesting. Text tools no longer parse keywords entered by users in the search box, but instead rely on analysis of web documents. In this way, the understanding of semantics will be more precise, or less speculative. The text tool then provides the user with several types of relevant results to choose from. This method is fundamentally different from the traditional method of stacking the correct results from a large number of documents in front of the user. There are also more and more text processing tools integrated with the browser. Top-down semantic technology does not require the publisher to do anything, so it is conceivable that context and text tools can be integrated into the browser. Firefox's recommended extension page provides many text browsing solutions, such as Interclue, ThumbStrips, Cooliris, and BlueOrganizer.
    Semantic database is a development direction of annotated semantic web applications. Twine is in the beta testing phase. It aims to build a private knowledge base of people, companies, events, and places. The data source is unstructured content of various forums, which can be submitted through bookmarks, emails or manual methods. The technology is still maturing, but the benefits it can bring are obvious. It is conceivable that a Twine-based application is a personalized search, and the search results are filtered through a personal knowledge base. The underlying data representation of Twine is RDF, which can be used by other semantic web services, but its core algorithms, such as entity extraction, are commercialized through semantic APIs. Reuters also provides a similar API interface. Another pioneer of the semantic database is a company called Metaweb, whose product is Freebase. From the form it shows, Freebase is just a more structured wikipedia version based on RDF. But the goal of Freebase is to build a world information database like wikipedia. The power of this database is that it can perform precise queries (like relational databases). So its prospects are still better search. But the question is, how does Freebase keep pace with world information? Google indexes web documents daily, which can evolve with the development of the web. Freebase's information now comes only from personal editing and data retrieved from wikipedia or other databases. If you want to expand this product, you must perfect the process of obtaining unstructured information from the entire network, analyzing and updating the database. Keeping up with the world is a challenge for all database approaches. For Twine, constant user data needs to be added, while for Freebase, constant data from the network needs to be added. These problems are not easy to solve, and must be properly handled before they are truly practical. The emergence of all new technologies requires defining some concepts and getting some categories. The Semantic Web offers an exciting prospect: improving the discoverability of information, enabling complex searches, and novel ways of browsing the web. In addition, the Semantic Web has different meanings to different people. It has different definitions for enterprises and consumers. It also has different meanings in different types such as top-down vs. bottom-up, microformats VS RDF. In addition to these patterns, we have also seen the development of semantic APIs and text browsing tools. All of these are still in their early stages of development, but they all carry expectations to change the way we interact with network information.
    The advanced stage of the Semantic Web enables libraries, ticket sales systems, customer management systems, and decision systems to all play a good role. For example, if you want to travel, as long as you provide the specific time requirements and the type of domestic travel you like to the query system supported by the Semantic Web, then the corresponding domestic attractions, the best travel plans and precautions, tips and travel agency evaluations can all be very Quickly get ready on your browser page.
    The Semantic Web will eventually apply the advanced stage of the network to every corner of the world. Everyone has his or her own network IP identification. Personal consumption credit, medical treatment, files, etc. are all in his network identity. At the same time, the online community is more active than the real community, and the online society is more orderly and harmonious.

    IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?