At its inception, the young internet was all about sharing knowledge. Then, business concerns came to the web and the focus was downgraded to information. Now, exponential growth turns a surfeit of information into meaningless data, with the looming risk of web contents being once again downgraded. And the danger is compounded by the inroads of physical objects bidding for full netizenship and equal rights in the so-called “internet of things”.
How to put words on a web of things (Ai Weiwei)
As it happens, that double perspective coincides with two basic search mechanisms, one looking for identified items and the other for information contents. While semantic web approaches are meant to deal with the latter, it may be necessary to take one step further and to bring the problem (a web of things and meanings) and the solutions (search strategies) within a integrated perspective.
Down with the System Aristocrats
The so-called “internet second revolution” can be summarized as the end of privileged netizenship: down with the aristocracy of systems with their absolute lid on internet residency, within the new web everything should be entitled to have a voice.
Before and after the revolution: everything should have a say
But then, events are moving fast, suggesting behaviors unbecoming to the things that used to be. Hence the need of a reasoned classification of netizens based on their identification and communication capabilities:
- Humans have inherent identities and can exchange symbolic and non symbolic data.
- Systems don’t have inherent identities and can only exchange symbolic data.
- Devices don’t have inherent identities and can only exchange non symbolic data.
- Animals have inherent identities and can only exchange non symbolic data.
Along that perspective, speaking about the “internet of things” can be misleading because the primary transformation goes the other way: many systems initially embedded within appliances (e.g cell phones) have made their coming out by adding symbolic user interfaces, mutating from devices into fully-fledged systems.
Physical Integration: The Meaning of Things
With embedded systems colonizing every nook and cranny of the world, the supposedly innate hierarchical governance of systems over objects is challenged as the latter calls for full internet citizenship. Those new requirements can be expressed in terms of architecture capabilities :
- Anywhere (Where): objects must be localized independently of systems. That’s customary for physical objects (e.g Geo-localization), but may be more challenging for digital ones on they way across the net.
- Anytime (When): behaviors must be synchronized over asynchronous communication channels. Existing mechanism used for actual processes (e.g Network Time protocol) may have to be set against modal logic if it is used for their representation.
- Anybody (Who): while business systems don’t like anonymity and rely on their interfaces to secure access, things of the internet are to be identified whatever their interface (e.g RFID).
- Anything (What): objects must be managed independently of their nature, symbolic or otherwise (e.g 3D printed objects).
- Anyhow (How): contrary to business systems, processes don’t have to follow predefined scripts and versatility and non determinism are the rules of the game.
Taking a sortie in a restaurant for example: the actual event is associated to a reservation, car(s) and phone(s) are active objects geo-localized at a fixed place and possibly linked to diners, great wines can be authenticated directly by smartphone applications, phones are used for conversations and pictures, possibly for adding to reviews, friends in the neighborhood can be automatically informed of the sortie and invited to join.
A dinner on the Net: place (restaurant), event (sortie), active objects (car, phone), passive object (wine), message (picture), business objects (reviews, reservations), and social beholders (network friends).
As this simple example illustrates, the internet of things brings together dumb objects, smart systems, and knowledgeable documents. Navigating such a tangle will require more than the Semantic Web initiative because its purpose points in the opposite direction, back to the origin of the internet, namely how to extract knowledge from data and information.
Moreover, while most of those “things” fall under the governance of the traditional internet of systems, the primary factor of change comes from the exponential growth of smart physical things with systems of their own. When those systems are “embedded”, the representations they use are encapsulated and cannot be accessed directly as symbolic ones. In other words those agents are governed by hidden agendas inaccessible to search engines. That problem is illustrated a contrario (things are not services) by services oriented architectures whose one of primary responsibility is to support services discovery.
Semantic Integration: The Actuality of Meanings
The internet of things is supposed to provide a unified perspective on physical objects and symbolic representations, with the former taken as they come and instantly donned in some symbolic skin, and the latter boiled down to their documentary avatars (as Marshall McLuhan famously said, “the medium is the message”). Unfortunately, this goal is also a challenge because if physical objects can be uniformly enlisted across the web, that’s not the case for symbolic surrogates which are specific to social entities and managed by their respective systems accordingly.
With the Internet of Systems, social entities with common endeavors agree on shared symbolic representations and exchange the corresponding surrogates as managed by their systems. The Internet of Things for its part is meant to put an additional layer of meanings supposedly independent of those managed at systems level. As far as meanings are concerned, the latter is flat, the former is hierarchized.
The internet of things is supposed to level the meaning fields, reducing knowledge to common sense.
That goal raises two questions: (1) what belongs to the part governed by the internet of things and, (2) how is its flattened governance to be related to the structured one of the internet of systems.
A World of Phenomena
Contrary to what its name may suggest, the internet of things deals less with objects than with phenomena, the reason being that things must manifest themselves, or their existence be detected, before being identified, if and when it’s possible.
Things first appear on radar when some binary footprint can be associated to a signalling event. Then, if things are to go further, some information has to be extracted from captured data:
- Coded data could be recognized by a system as an identification tag pointing to recorded known features and meanings, e.g a bar code on a book.
- The whole thing could be turned into its digital equivalent, and vice versa, e.g a song or a picture.
- Context and meanings could only be obtained by linking the captured data to representations already identified and understood, e.g a religious symbol.
From data to information: how to add identity and meaning to things
Whereas things managed by existing systems already come with net identities with associated meaning, that’s not necessarily the case for digitized ones as they may or may not have been introduced as surrogates to be used as their real counterpart: if handshakes can stand as symbolic contract endorsements, pictures thereof cannot be used as contracts surrogates. Hence the necessary distinction between two categories of formal digitization:
- Applied to symbolic objects (e.g a contract), formal digitization enables the direct processing of digital instances as if performed on actual ones, i.e with their conventional (i.e business) currency. While those objects have no counterpart (they exist simultaneously in both realms), such digitized objects have to bear an identification issued by a business entity, and that put them under the governance of standard (internet of systems) rules.
- Applied to binary objects (e.g a fac-simile), formal digitization applies to digital instances that can be identified and authenticated on their own, independently of any symbolic counterpart. As a corollary, they are not meant to be managed or even modified and, as illustrated by the marketing of cultural contents (e.g music, movies, books …), their actual format may be irrelevant. Providing agreed upon de facto standards, binary objects epitomize internet things.
To conclude on footprint, the Internet of Things appears as a complement more than a substitute as it abides by the apparatus of the Internet of Systems for everything already under its governance, introducing new mechanisms only for the otherwise uncharted things set loose in outer reaches. Can the same conclusion hold for meanings ?
Organizational vs Social Meanings
As epitomized by handshakes and contracts, symbolic representations are all about how social behaviors are sanctioned.
When not circumscribed within organizational boundaries, social behaviors are open to different interpretations.
In system-based networks representations and meanings are defined and governed by clearly identified organizations, corporate or otherwise. That’s not necessarily the case for networks populated by software agents performing unsupervised tasks.
The first generations of those internet robots (aka bots) were limited to automated tasks, performed on the account of their parent systems, to which they were due to report. Such neat hierarchical governance is being called into question by bots fired and forgotten by their maker, free of reporting duties, their life wholly governed by social events. That’s the case with the internet of things, with significant consequences for searches.
As noted above, the internet of things can consistently manage both system-defined identities and the new ones it introduces for things of its own. But, given a network job description, the same consolidation cannot be even considered for meanings: networks are supposed to be kept in complete ignorance of contents, and walls between addresses and knowledge management must tower well above the clouds. As a corollary, the overlapping of meanings is bound to grow with the expanse of things, and the increase will not be linear.
Contrary to identities, meanings usually overlap when things float free from systems’ governance.
That brings some light on the so-called “virtual world”, one made of representations detached from identified roots in the actual world. And there should be no misunderstanding: “actual” doesn’t refer to physical objects but to objects and behaviors sanctioned by social entities, as opposed to virtual, which includes the ones whose meaning cannot be neatly anchored to a social authority.
That makes searches in the web of things doubly challenging as they have to deal with both overlapping and shifting semantics.
A Taxonomy of Searches
Semantic searches (forms and pattern matching should be seen as a special case) can be initiated by any kind of textual input, key words or phrase. As searches, they should first be classified with regard to their purpose: finding some specific instance or collecting information about some topic.
Searches about instances are meant to provide references to:
- Locations, addresses, …
- Antiques, stamps,…
- Books, magazines,…
- Alumni, friends,…
- Concerts, games, …
- Cooking recipes, administrative procedures,…
- Status of shipment, health monitoring, home surveillance …
What are you looking for ?
Searches about categories are meant to provide information about:
- Geography, …
- Products marketing , …
- Scholarly topics, market researches…
- Customers relationships, …
- Business events, …
- Business rules, …
- Business processes …
That taxonomy of searches is congruent with the critical divide between things and symbolic representations.
Things and Symbolic Representations
As noted above, searches can be heeded by references to identified objects, the form of digital objects (sound, visuals, or otherwise), or associations between symbolic representations. Considering that finding referenced objects is basically a indexing problem, and that pattern matching is a discipline of its own, the focus is to be put on the third case, namely searches driven by words (as opposed to identifiers and forms). From that standpoint searches are best understood in the broader semantic context of extensions and intensions , the former being the actual set of objects and phenomena, the latter a selected set of features shared by these instances.
A search can therefore be seen as an iterative process going back and forth between descriptions and occurrences or, more formally, between intentions and extensions. Depending on the request, iterations are made of four steps:
- Given a description (intension) find the corresponding set of instances (extension); e.g “restaurants” > a list of restaurants.
- Given an instance (extension), find a description (intension); e.g “Alberto’s Pizza” > “pizzerias”.
- Extend or generalize a description to obtain a better match to request and context; e.g “pizzerias” > “Italian restaurants”.
- Trim or refine instances to obtain a better match to request and context; e.g a list of restaurants > a list of restaurants in the Village.
Iterations are repeated until the outcome is deemed to satisfy the quality parameters.
The benefit of those distinctions is to introduce explicit decision points with regard to the reference models heeding the searches. Depending on purpose and context, such models could be:
- Inclusive: can be applied to any kind of search.
- Semantic: can only be applied to circumscribed domains of knowledge. That’s the aim of the semantic web initiative and the Web Ontology Language (OWL).
- Organizational: can only be applied within specific institutional or business contexts. They could be available to all or through services with restricted access and use.
From Meanings to Things, and back
The stunning performances of modern search engines comes from a combination of brawn and brains, the brainy part for grammars and statistics, the brawny one for running heuristics on gigantic repositories of linguistic practices and web researches. Moreover, those performances improve “naturally” with the accumulation of data pertaining to virtual events and behaviors. Nonetheless, search engines have grey or even blind spots, and there may be a downside to the accumulation of social data, as it may increase the gap between social and corporate knowledge, and consequently the coherence of outcomes.
Iterations are repeated until the outcome is deemed to satisfy the quality parameters.
That can be illustrated by a search about Amedeo Modigliani:
- A inclusive search for “Modigliani” will use heuristics to identify the artist (a). An organizational search for an homonym (e.g a bank customer) would be dealt with at enterprise level, possibly through an intranet (c).
- A search for “Modigliani’s friends” may look for the artist’s Facebook friends if kept at the inclusive level (a1), or switch to a semantic context better suited to the artist (a2). The same outcome would have been obtained with a semantic search (b).
- Searches about auction prices may be redirected or initiated directly, possibly subject to authorization (c).