Monday, 9 June 2014

ANTIFRAGILE 2014! Part 1 (Keynote speech and first presentation)

As you may know already, the first edition of the ANTIFRAGILE workshop took place on June 3 in Hasselt, Belgium. The workshop was a satellite event of the ANT'14 Conference, hosted at the University of Hasselt.

Being a workshop on computational antifragility, it was only normal that the workshop itself had to be... put to test! In fact we had to tolerate and learn from a number of problems, both technical and logistic in nature, including a missing remote controller for operating the LCD projector; no computers being available with the projector; and people being dispatched to the wrong Campus as a result of wrong information at the Conference website. In fact, we can proudly say that we managed to compensate for all those inconveniences at the minimal cost of 30' delay! (Yes, we had considered the possibility of such a delay and used an elasticity strategy to reduce its effects...)

Presentation summary Presentation Article
Dr. Kenny H. Jones Presentation Article
Vincenzo De Florio Presentation Article

Glad to have passed our ordeal and happy to have earned the right to call ours as a truly "antifragile workshop," we began our meeting with the insightful keynote speech of Dr. Kenny H. Jones, from the NASA Langley Research Center (LaRC) in Hampton. Dr. Jones' presentation and paper are freely available for download. Among the many important contributions and lessons learned that Dr. Jones shared with us, I found several of the statements in his abstract as particularly convenient for the occasion1:
"NASA is working to infuse concepts from Complexity Science in to the engineering process. Some [...] problems may be solved by a change in design philosophy. Instead of designing systems to meet known requirements that will always lead to fragile systems at some degree, systems should be designed wherever possible to be antifragile: designing cognitive cyber-physical systems that can learn from their experience, adapt to unforeseen events they face in their environment, and grow stronger in the face of adversity."
Dr. Jones in particular identifies a first "deadly sin" of traditional engineering practice in reductionism, namely the assumption that "Any system, no matter how complicated, can be completely understood if reduced to elemental components". This leads to the fallacy that "By fully understanding the elements, system behavior can be predicted and therefore controlled". While this may be true in some cases, more and more we are confronted with systems that are more than the sum of their parts (which, incidentally, is the theme of a presentation that I recently gave at the 2014 SuperMinds Event!) System behavior in this case is more difficult to capture, predict, and control, as it is the result of complex interactions among the parts and the environment they are set to operate in. We use to say that in these complex systems the behavior emerges from those interactions. Dr. Jones observed how despite considerable effort and funding a non-negligible gap exists between theoretical results and practical solutions. This brought to partnerships such as NFS and LaRC and to initiatives as the Inter Agency Working Group -- both of which were actions specifically addressing the solution of the above gap. Apart from partnerships, NASA also initiated internal actions specifically meant to address the engineering practice of complex systems. The Complex Aeronautics Systems Team at LaRC is one such activity. Ultimate aim of those initiatives is being able to engineer large-scale complex systems that be able to deal more effectively with uncertainty; optimally self-manage their action; be less costly and characterized by reduced development times; and be applicable to general and augmented contexts such as the social, the political, and the economic.

It is at this point that Dr. Jones introduces his main observation: a second "deadly sin" of the traditional engineering practice, he states, is that currently systems are designed to be fragile in the first place! In fact, traditional systems are the result of design requirements, and those design requirements systematically introduce Achilles' Heels in the system: strict dependences on a reference environment that in practice prohibit the system to address the unexpected. In fact any violation of the design requirement inherently translates in an assumption failure. In other words, "If the system is stressed beyond the design requirements, it will fail", and systems "are designed to be fragile at some degree"! Antifragile systems engineering is in fact quite the opposite: a novel practice such that the system becomes stronger when stressed; after all, as the famous Latin quote says, it is per aspera (through difficulties) that we get ad astra (to the stars — a primary objective of NASA by the way!!)

With the words of Dr. Jones, "what is needed are new methods producing systems that can adapt functionality and performance to meet the unknown".
Dr. Jones then introduced a non exhaustive list of very interesting exemplary applications and concluded his speech with a number of statements. His final one constitutes in my opinion the major lesson learned and the starting point of our work in computational antifragility:
A change in design philosophy is needed that will produce anti fragile systems: systems able to learn to perform in the face of the unexpected and improve performance beyond what was anticipated.
The speech was intertwined with rapid questions / answers and was attended also by some of the organizers of the main Conference, ANT'14.

I had the pleasure and honor to give the second presentation, entitled "Antifragility = Elasticity + Resilience + Machine Learning — Models and Algorithms for Open System Fidelity". Presentation and paper are freely available for download.

Starting point of my discussion are the two questions: what is computational antifragility, and why is it different from established disciplines such as dependability, resilience, elasticity, robustness, and safety? My answer is constructed through a number of "moves". Making use of the classic Aristotelian definition, I first focus my attention on resilience, a system's ability to preserve one's identity through an active behavior. Again Aristotle is quoted as the Giant who first introduced resilience by the name of entelechy (ἐντελέχεια). But what is identity, and what is behavior? We tackle first identity.

We do this via an example: we consider a Voice-over-IP application and a call between two endpoints; and we observe that the identity of this application is not merely the fact that communication between the two endpoints is possible; the identity is preserved only if the quality-of-experience throughout the call matches the expectations of the two endpoints! This brings the endpoints "in the resilience loop" so to say. A system is resilient only so long as it is able to adjust its operation to what the two external parts — the users of the system — consider as "acceptable"; for instance, if the endpoints are two human beings, this means that the expected quality is that of a conversation of two people talking and listening to each other without any problem.

In practice the experienced quality is a dynamic system, namely one that varies its characteristics with time; and the challenge of resilience is that of being able to compensate for disturbances and keep the experienced quality "not too far away" from the minimal quality expected by the endpoints. We conclude that resilience calls for fidelity, namely quality of representation-and-control between a reference domain and an execution domain. This is in fact an argument brought about by another great Giant scholar, Leibniz. As anticipated by Leibniz, systems operate in a resource-constrained world and are characterized by different "powers of representation", namely different fidelity. The higher the system fidelity — the greatest that is its power of representation — the stronger is that system's claim for existence: its resilience! Thus fidelity (both reflective fidelity and control fidelity) among a reference domain and an execution domain represent one of the factors that play a significant role in the emergence of quality and resilience.

A typical example is fidelity in cyberphysical systems. As indicated by their very name, cyberphysical systems base their action on the fidelity of properties in the physical world and corresponding properties in the "cyberworld". This fidelity is, in mathematical terms, an isomorphism, namely a bijective function that preserves concepts and operations. Thus in the case of the Voice-over-IP example, fidelity should be able to preserve concepts such as delay, jitter, echo, and latency: physical phenomena should correspond to cyberphenomena, and vice-versa. In fact a better approach is to talk of fidelities and consider a fidelity isomorphism for each of the n figures that an open system either senses or control. I use the terms n-open systems and n-open system fidelities to refer to open systems and their fidelity.

Fidelity allows us to reason about a system's identity. In order to exemplify this I use the case of systems that are open the physical dimension of time. Fidelity in this case is an isomorphism between cybertime and physical time. Several fidelity classes are possible, including for instance the following ones:

[RT]0: Perfect fidelity
In this case we have perfect correspondence between wall-clock time and computer-clock time. No drift is possible and the two concepts can always reliably related to one another.
[RT]1: Strong fidelity
This corresponds to hard real-time systems. Drifts are possible, but they are typically known and bound. The system typically enacts simple forms of behavior (see further on).
[RT]2: Statistically strong fidelity
This corresponds to soft real-time systems. Drifts are not fixed bounds but rather averages and standard deviations.
[RT]3: Best-effort fidelity
As a result of quality-vs-costs trade-offs the quality drifts experienced by the user should be most of the time acceptable and not discourage the user form using the system.
[RT]4: No fidelity
No guarantee is foreseen; drifts are possible, unbound, unchecked, and uncontrolled.
The above classes (or others, defined for instance by differentiating among reference bounds and statistical figures) allow to provide an operational definition of resilience: Resilience is
Being able to perform one's function
("Being at work")
Staying in the same class!
Identity is violated as soon as the system changes its class and is no more able to "stay the same".
This brings the discussion to a second coordinate of resilience, namely behavior. Behavior is interpreted here as any change an entity enacts in order not to lose its system identity, namely to "stay in the same class". As suggested by Rosenblueth, Wiener, and Bigelow, we can distinguish different cases of behavior, including the following ones:
Passive behavior
corresponding to inert systems.
Purposeful behavior
this is the simplest behavior having a purpose, as it is the case with, e.g., servo-mechanism. This is the domain of Elasticity: faults, attacks, and disturbances are masked out by making use of redundancy. Said redundancy is predefined and statically defined as a result of worst-case analyses. So long as the analyses are correct the system is resilient; as soon as this is not the case, the system fails. The resulting systems are inherently fragile (as explained by sitting ducks for change!
Teleologic and extrapolatory behaviors
are more complex purposeful behaviors of systems whose action is governed by a feedback loop from the goal or from its extrapolated future state. This is the domain of Resilience: here systems are able to "be at work" and respond to changes — to some degree — making use of perception, awareness, and planning.
And finally,
This class of behaviors extends the set proposed by Rosenblueth, Wiener and Bigelow and corresponds to systems that plan their resilience by evaluating strategy-environment fits and learning which option best matched which scenario. Evolutionary Game Theory and machine learning are likely to play a significant role in this context.
The final move of my treatise is then made by stating a conjecture: That the domain of auto-predictive behaviors is that of antifragile computing systems. Antifragile systems are thus resilient systems that are open to their own system-environment fit and that are able to develop wisdom as a result of matches between available strategies and obtained results. A general structure to achieve antifragility is also conjectured and introduced: an antifragile computer system should operate as follows:
  • Monitor fidelities;
  • Whenever system identity is not jeopardized:
    • Use computational elasticity strategies;
  • Whenever system identity is jeopardized:
    • Use computational resilience strategies, auto-predictive behaviors, and machine learning to compensate reactively or proactively for the drift; assess strategy-environment fits; and persist lessons learned.
Our conclusions are finally stated: by differentiating and characterizing antifragile behaviors from elastic and resilient behaviors we concluded that computational antifragility is indeed different from other systemic abilities such as elasticity or resilience. A great deal of work is needed to move from ideas and theoretical concepts to an actual antifragile engineering practice of computers and their software; on the other hand, the expected returns are also exceptional and are mandated by the ever growing complexity of our systems, services, and societies!


1: Text in blue are original contributions by Vincenzo De Florio.