Archive for the ‘Tests’ Category

Deep Blind Testing

March 21, 2017

Preamble

Tests are meant to ensure that nothing will go amiss. Assuming that expected hazards can be duly dealt with beforehand, the challenge is to guard against unexpected ones.

Unexpected Outcome (Ariel Schlesinger)

That would require the scripting of every possible outcomes in an unlimited range of unknown circumstances, and that’s where Deep Learning may help.

What to Look For

As Donald Rumsfeld once famously said, there are things that we know we don’t know, and things we don’t know we don’t know; hence the need of setting things apart depending on what can be known and how, and build the scripts accordingly:

  • Business requirements: tests can be designed with respect to explicit specifications; yet some room should also be left for changes in business circumstances.
  • Functional requirements: assuming business requirements are satisfied, the part played by supporting systems can be comprehensively tested with respect to well-defined boundaries and operations.
  • Quality of service: assuming business and functional requirements are satisfied, tests will have to check how human interfaces and resources are to cope with users behaviors and expectations which, by nature, cannot be fully anticipated.
  • Technical requirements: assuming business and functional requirements are satisfied as well as users’ expectations for service, deployment, maintenance, and operations are to be tested with regard to feasibility and costs.

Automated testing has to take into account these differences between scope and nature, from bounded and defined specifications to boundless, fuzzy and changing circumstances.

Automated Software Testing

Automated software testing encompasses two basic components: first the design of test cases (events, operations, and circumstances), then their scripted execution. Leading frameworks already integrate most of the latter together with the parts of the former targeting technical aspects like graphical user interfaces or system APIs. Artificial intelligence (AI) and machine learning (ML) have also been tried for automated test generation, yet with a scope limited by dependency on explicit knowledge, and consequently by the need of some “manual” teaching. That hurdle may be overcame by the deep learning ability to get direct (aka automated) access to implicit knowledge.

Reconnaissance: Known Knowns

Systems are designed artifacts, with the corollary that their components are fully defined and their behavior predictable. The design of technical test cases can therefore be derived from what is known of software and systems architectures, the former for test units, the latter for integration and acceptance tests. Deep learning could then mine recorded log-files in order to identify critical cases’ events and circumstances.

Exploration: Known Unknowns

Assuming that applications must be tested for use during their expected shelf life, some uncertainty has to be factored in for future business circumstances. Yet, assuming applications are designed to meet specific business objectives, such hypothetical circumstances should remain within known boundaries. In that context deep learning could be applied to exploration as well as policies:

  • Compared to technical test cases that can rely on the content of systems log-files, business and functional ones have to look outside and mine raw data from business environments.
  • In return, the relevancy of observations can be assessed with regard to business objectives, improved, and feed the policy module in charge of defining test cases.

Blind Errands: Unknown Unknowns

Even with functional and technical capabilities well-tested and secured, quality of service may remain contingent on human quirks: instinctive or erratic behaviors that could thwart the best designed handrails. On one hand, and due to their very nature, such hazards are not to be easily forestalled by reasoned test cases; but on the other hand they don’t take place in a void but within known functional circumstances. Given that porosity of functional and cognitive layers, the validity of functional test cases may be compromised by unfathomable cognitive associations, and that could open the door to unmanageable regression. Enter deep learning and its ability to extract knowledge from insignificance.

Compared to business and functional test cases, hazards are not directly related to business activities. As a consequence, the learning process cannot be guided by business and functional test cases but has to chart unpredictable human behaviors. As it happens, that kind of learning combining random simulation with automated reinforcement is what makes the specificity of deep learning.

From Non-regression to Self-improvement

As a conclusion, if non-regression is to be the cornerstone of quality management, test cases are to be set along clear swim-lanes: business logic (independently of systems), supporting systems functionalities (for shared applications), users interfaces (for non shared interactions). Then, since test cases are also run across swim-lanes, it opens the door to feedback, e.g unit test cases reassessed directly from business rules independently of systems functionalities, or functional test cases reassessed from users’ behaviors.

Considering that well-defined objectives, sound feedback mechanisms, and the availability of massive data from systems logs (internal) and business environment (external) are the main pillars of deep learning technologies, their combination in integrated frameworks could result in a qualitative leap toward self-improving automated test cases.

Further Reading

 

Ergonomy, Fingertips Errors & Automated Testing

February 10, 2014

Objective

When interacting with systems, users do things they aren’t supposed to do and walk along irrelevant, even unthinkable, paths that can put tests designers at a loss. This apparent chink between users’ conscious self and their fingertips can be explained by the way humans assess situations and make decisions. Curtailing it is the aim of ergonomics.

Errors at fingerstips (Rembrandt)

Anatomy of Errors: from brain to fingers (Rembrandt)

Taking a leaf from A. Tversky and D. Kahneman (who received the 2002 Nobel Price in Economics), decision-making relies on two cognitive mechanisms:

  1. The first one “operates automatically and quickly, with little or no effort and no sense of voluntary control”. It’s put in use when actual situations must be assessed and decisions taken rapidly if not instantly.
  2. The second one “allocates attention to the effortful mental activities that demand it, including complex computations”. It’s put in use when situations can be assessed with regard to past experience in order to support informed decisions making.

That distinction can be directly applied to users’ behaviors interacting with systems:

  1. Intuitive behavior: decisions are taken on the basis of the visual context and options as presented by users interfaces before taking into account underlying business contents and logic.
  2. Rational behavior: decisions are taken on the basis of business contents and logic disregarding supporting systems interfaces.

Set in context, that distinction can be put in parallel (but not confused) with the one between domain and functional requirements, the former dealing rationally with business objects and logic, the latter putting the former to use through interactions with supporting systems.

Functional requirements describe the part played by supporting systems

Functional requirements describe the part played by supporting systems

Assuming that business logic should not be contingent on supporting systems interfaces, the best option would be to test its implementation independently of users interactions; moreover, tests targeting intuitive behaviors (i.e not directly based on domain specific contents), could then be generated automatically.

Looking for Errors

Given that testing is meant to find flaws in deliverables, tests are certainly more effective when designers know what they are looking for.

For that purpose phased approaches rely on sequences of differentiated tests dealing successively with programming (unit tests), functional requirements (integration tests), and business requirements (acceptance tests).  The unfortunate downside of those policies is that the most wide-ranging flaws are the last to be looked for, with the risk of being found after cascading and costly consequences for functionalities and programs.

Phased and Iterative approaches to tests

Phased and Iterative approaches to tests

Conversely, agile approaches follow iterative policies, with each development cycle combining the definition, programming, and tests of software products. When properly implemented those policies significantly improve the early detection and correction of errors whatever their origin. Yet, since there is no explicit management of intermediate outcomes, it’s difficult to differentiate the tests according the kind of errors to look for, e.g faulty business rules implementation or flawed user interface.

Architecture driven approaches may provide an answer, with requirements unambiguously sorted out depending on their architectural footprint: business contents or system functionalities. As a corollary, tests could also be designed along the same lines, targeting business rationale or human behavior.

Errors in Mirrors

Acceptance tests being performed with regard to requirements, they should be designed along requirements taxonomy, respectively for business logic, users’ interactions, quality of services, and components implementation. Being aligned on requirements, those tests can be neatly defined with regard to closed sets of specifications, functional or otherwise.

Functional tests have to expect the unexpected

Functional tests have to expect the unexpected

But that’s not the case for users’ interactions because people behaviors are not fully predictable; hence, while tests can be systematically designed with regard to the set of users’ actions framed by business and functional requirements, there is no way to comprehensively and unambiguously check for all and every possible behavioral contingencies. That will make for three levels of functional tests:

  1. Implementation of business logic: tests should be designed directly from business requirements, independently of interactions with users.
  2. Implementation of scenarii: while interactions are defined in reference to business logic, their validation should focus on the presentation of contents and dialog control.
  3. Users exceptions: in addition to inputs validity, already checked with business logic, and users’ actions, supposedly secured by interaction scenarii, it is necessary to check that unexpected behaviors have been properly considered .
How to check that unexpected behaviors have been properly considered ?

How to check that unexpected behaviors have been properly considered ?

In other words, functional tests will have to look simultaneously for errors in software (defined with regard to a finite set of requirements), and for users’ mistakes (set in an open range of behaviors). As if tests designers were to mirror users errors in order to look for software ones. So, assuming that errors in business logic and interactions have been considered, what should still be checked, and how ?

Fingertips Errors

When faced with choices, users bank on mental maps combining graphical and business layers, with the implicit assumption that maps’ contexts and concerns are kept up to date. Those maps combine three communication mechanisms:

  • Languages, natural or specific, use syntax and semantics to define business contents, logic, and operations.
  • Icons use similarity for the visual representation of business operations or functional primitives (e.g create, delete, etc).
  • Signals use proximity to draw users’ attention to predefined events (e.g sounds for operations completion or incoming emails).

While language-based interactions are supposedly fully covered by business and functional tests, icons and signals make room for “fingertips” reactions which cannot be directly framed within business logic or functional scenarii, and therefore cannot be comprehensively checked for erroneous behaviors.

Icons and signal based communication can trigger unexpected behaviors.

Icons and signal based communication can trigger unexpected behaviors.

Yet, if instinctive reactions preclude rational considerations, decisions may be swayed by analogies and associations before being informed by the relevant business contents. To prevent that risk, test scenarii built on business logic and functional interactions should be extended in order to take into account the intuitive aspects of users’ behaviors.

Mental Maps & Automated Tests

As noted above, mental maps are built on three layers, one deep (language semantics) and two shallow (icons and signals). While the shallow layers are supposed to reference the deep one, icons and signals may induce instinctive behaviors independently of the referenced business logic. Those behaviors can be triggered by two kinds of mechanisms:

  • Analogy: users will look for similarities and familiar configurations.
  • Proximity: users will look for continuity with regard to scope and operations.

Clearly, lapses in such behaviors will normally escape tests designed for business and functional requirements; yet, by being driven by self-contained mechanisms, intuitive behaviors can be checked independently of references to business contents. And that may open the door to automated tests generation.

With regard to similarities, tests should look for possible confusion between:

  • Objects with common representation but specific features (inheritance).
  • Operations with shared semantics but different scope (polymorphism).
  • Sequences with shared operations but different timing .

With regard to proximity, tests should look for possible confusion between:

  • Objects and their parts, or between their parts (structural proximity).
  • Operations usually associated into the same activity (functional proximity).
  • Operations usually executed successively (chronological proximity).

Scripts for such tests could be generated through pattern-matching and run by wizard applications.

Further Reading

External Links