The objective of the National Information Exchange Model (NIEM) is to provide a “dictionary of agreed-upon terms, definitions, relationships, and formats that are independent of how information is stored in individual systems.”
NIEM’s model makes no difference between data and information (Alfred Jensen)
For that purpose NIEM’s model combines commonly agreed core elements with community-specific ones. Weighted against the benefits of simplicity, this architecture overlooks critical distinctions:
- Inputs: Data vs Information
- Dictionary: Lexicon and Thesaurus
- Meanings: Lexical Items and Semantics
- Usage: Roots and Aspects
That shallow understanding of information significantly hinders the exchange of information between business or institutional entities across overlapping domains.
Inputs: Data vs Information
Data is made of unprocessed observations, information makes sense of data, and knowledge makes use of information. Given that NIEM is meant to be an exchange between business or institutional users, it should have no concern with data mining or knowledge management.
As an exchange, NIEM should have no concern with data mining or knowledge management.
The problem is that, as conveyed by “core of data elements that are commonly understood and defined across domains, such as person, activity, document, location”, NIEM’s model makes no explicit distinction between data and information.
As a corollary, it implies that data may not only be meaningful, but universally so, which leads to a critical trap: as substantiated by data analytics, data is not supposed to mean anything before processed into information; to keep with examples, even if the definition of persons and locations may not be specific, the semantics of associated information is nonetheless set by domains, institutional, regulatory, contractual, or otherwise.
Data is meaningless, information meaning is set by semantic domains.
Not surprisingly, that medley of data and information is mirrored by NIEM’s dictionary.
Dictionary: Lexicon & Thesaurus
As far as languages are concerned, words (e.g “word”, “ξ∏¥” ,”01100″) remain data items until associated to some meaning. For that reason dictionaries are built on different levels, first among them lexical and semantic ones:
- Lexicons take items on their words and gives each of them a self-contained meaning.
- Thesauruses position meanings within overlapping galaxies of understandings held together by the semantic equivalent of gravitational forces; the meaning of words can then be weighted by the combined semantic gravity of neighbors.
In line with its shallow understanding of information, NIEM’s dictionary only caters for a lexicon of core standalone items associated with type descriptions to be directly implemented by information systems. But due to the absence of thesaurus, the dictionary cannot tackle the semantics of overlapping domains: if lexicons alone can deal with one-to-one mappings of items to meanings (a), thesauruses are necessary for shared (b) or alternative (c) mappings.
Shared or alternative meanings cannot be managed with lexicons
With regard to shared mappings (b), distinct lexical items (e.g qualification) have to be mapped to the same entity (e.g person). Whereas some shared features (e.g person’s birth date) can be unequivocally understood across domains, most are set through shared (professional qualification), institutional (university diploma), or specific (enterprise course) domains .
Conversely, alternative mappings (c) arise when the same lexical items (e.g “mole”) can be interpreted differently depending on context (e.g plastic surgeon, farmer, or secret service).
Whereas lexicons may be sufficient for the use of lexical items across domains (namespaces in NIEM parlance), thesauruses are necessary if meanings (as opposed to uses) are to be set across domains. But thesauruses being just tools are not sufficient by themselves to deal with overlapping semantics. That can only be achieved through a conceptual distinction between lexical and semantic envelops.
Meanings: Lexical Items & Semantics
NIEM’s dictionary organize names depending on namespaces and relationships:
- Namespaces: core (e.g Person) or specific (e.g Subject/Justice).
- Relationships: types (Counselor/Person) or properties (e.g PersonBirthDate).
NIEM’s Lexicon: Core (a) and specific (b) and associated core (c) and specific (d) properties
But since lexicons know only names, the organization is not orthogonal, with lexical items mapped indifferently to types and properties. The result being that, deprived of reasoned guidelines, lexical items are chartered arbitrarily, e.g:
Based on core PersonType, the Justice namespace uses three different schemes to define similar lexical items:
- “Counselor” is described with core PersonType.
- “Subject” and “Suspect” are both described with specific SubjectType, itself a sub-type of PersonType.
- “Arrestee” is described with specific ArresteeType, itself a sub-type of SubjectType.
Based on core EntityType:
- The Human Services namespace bypasses core’s namesake and introduces instead its own specific EmployerType.
- The Biometrics namespace bypasses possibly overlapping core Measurer and BinaryCaptured and directly uses core EntityType.
Lexical items are chartered arbitrarily
Lest expanding lexical items clutter up dictionary semantics, some rules have to be introduced; yet, as noted above, these rules should be limited to information exchange and stop short of knowledge management.
Usage: Roots and Aspects
As far as information exchange is concerned, dictionaries have to deal with lexical and semantic meanings without encroaching on ontologies or knowledge representation. In practice that can be best achieved with dictionaries organized around roots and aspects:
- Roots and structures (regular, black triangles) are used to anchor information units to business environments, source or destination.
- Aspects (italics, white triangles) are used to describe how information units are understood and used within business environments.
Information exchanges are best supported by dictionaries organized around roots and aspects
As it happens that distinction can be neatly mapped to core concepts of software engineering.