Elasticity, Resilience, Antifragility in CoLlective and Individual Objects and Systems: November 2013

Monday 25 November 2013

The Black Holes of complexity

Is it good or is it bad if the hardware and software complexity are hidden to the software designer? If you're a designer, you will probably tend to say "It's very good! Why the hell should I be interested in the type of transport protocol my software's going to use? What difference would it make if I knew what system calls are being executed in the background? Or if my software stores local data in SDRAM memory chips or elsewhere??" I'm sure the general opinion would be that all the above choices don't really matter that much — apart maybe from some difference in performance!

What is certainly true is that the road to complexity hiding provided the developer with abstract ways to compose ever larger "blocks" of functionality in short time and in easy and intuitive ways. First modules and layered organization; then object orientation; and more recently visual languages, services, components, aspects, and models, provided the developer with tools to compose and orchestrate highly powerful and sophisticated software systems in a relatively short amount of time.

But... there is a but regrettably. First of all, though hidden, still such complexity is part of the overall system being developed. And secondly, as it's become so easy to deal with complexity, more and more functionality is being put in place. In other words, software (and in general, computer systems) have become sort of black holes of complexity: they attract more and more complexity that simply disappears from our sight although makes the system more and more "heavy" — hard to predict and control.

As I mentioned somewhere else,

Across the system layers, a complex and at times obscure “web” of software machines is being executed concurrently by our computers. Their mutual dependencies determine the quality of the match of our software with its deployment platform(s) and run-time environment(s) and, consequently, their performance, cost, and in general their quality of service and experience. At our behest or otherwise, a huge variety of design assumptions is continuously matched with the truth of the current conditions.

A hardware component assumed to be available; an expected feature in an OSGi bundle or in a web browser platform; a memory management policy supported by a mobile platform, or ranges of operational conditions taken for granted at all times — all are but assumptions and all have a dynamically varying truth value. Depending on this value our systems will or will not experience failures. Our societies, our very lives, are often entrusted to machines driven by software; weird as it may sound, in some cases this is done without question — as an act of faith as it were. This is clearly unacceptable. The more we rely on computer systems — the more we depend on their correct functioning for our welfare, health, and economy — the more it becomes important to design those systems with architectural and structuring techniques that allow software complexity to be decomposed, but without hiding in the process hypotheses and assumptions pertaining e.g. the target execution environment and the expected fault- and system-models.

How to deal with this problem is still a matter of discussion. My idea is that it should be made possible to express, manage, and execute "probes" on the dynamic values of our design assumptions. While the "black hole" would remain largely hidden, those probes would help "enlight" on the likelihood that our hypotheses are actually met by the current conditions of the system and its deployment environment. A possible way to organize those probes could be maybe that of a distributed organization of cooperating autonomic digital entities, each of them representing a different unit of information encapsulation: a layer, an object, a service, etc., mimicking the structure and organization of the corresponding entities. A fractal social organization of said probes could provide with autonomic ways to deal with ever growing amounts of complexity without reaching the event horizon of unmanageability.

The Black Holes of complexity by Vincenzo De Florio is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Permissions beyond the scope of this license may be available at http://win.uantwerpen.be/~vincenz/.

Saturday 16 November 2013

ANTIFRAGILE 2014: 1st Workshop "From Dependable to Resilient, from Resilient to Antifragile Ambients and Systems"

As well-known, dependability refers to a system’s trustworthiness and measures several aspects of the quality of its services – for instance how reliable, available, safe, or maintainable those services are. Resilience differs from dependability in that it focuses on the system itself rather that its services; it implies that the system when subjected to faults and changes 1) will continue distributing its services 2) without losing its peculiar traits, its identity: the system will “stay the same”. Antifragility goes one step further and suggests that certain systems could actually “get better”, namely improve their system-environment fit, when subjected (to some system-specific extent) to faults and changes. Recent studies of Professor N. Taleb introduced the concept of antifragility and provided a characterization of the behaviors enacted by antifragile systems. The engineering of antifragile computer-based systems is a challenge that, once met, would allow systems and ambients to self-evolve and self-improve by learning from accidents and mistakes in a way not dissimilar to that of human beings. Learning how to design and craft antifragile systems is an extraordinary challenge whose tackling is likely to reverberate on many a computer engineering field. New methods, programming languages, even custom platforms will have to be designed. The expected returns are extraordinary as well: antifragile computer engineering promises to enable realizing truly autonomic systems and ambients able to meta-adapt to changing circumstances; to self-adjust to dynamically changing environments and ambients; to self-organize so as to track dynamically and proactively optimal strategies to sustain scalability, high-performance, and energy efficiency; to personalize their aspects and behaviors after each and every user. And to learn how to get better while doing it.

The ambition and mission of ANTIFRAGILE is to enhance the awareness of the above challenges and to begin a discussion on how computer and software engineering may address them. As a design aspect cross-cutting through all system and communication layers, antifragile engineering will require multi-disciplinary visions and approaches able to bridge the gaps between “distant” research communities so as to

propose novel solutions to design and develop antifragile systems and ambients;
devise conceptual models and paradigms for antifragility;
provide analytical and simulation models and tools to measure systems ability to withstand faults, adjust to new environments, and enhance their resilience in the process;
foster the exchange of ideas and lively discussions able to drive future research and development efforts in the area.

The main topics of the workshop include, but are not limited to:

Conceptual frameworks for antifragile systems, ambients, and behaviours;
Dependability, resilience, and antifragile requirements and open issues;
Design principles, models, and techniques for realizing antifragile systems and behaviours;
Frameworks and techniques enabling resilient and antifragile applications;
Antifragile human-machine interaction;
End-to-end approaches towards antifragile services;
Autonomic antifragile behaviours;
Middleware architectures and mechanisms for resilience and antifragility;
Theoretical foundation of resilient and antifragile behaviours;
Formal modeling of resilience and antifragility;
Programming language support for resilience and antifragility;
Machine learning as a foundation of resilient and antifragile architectures;
Antifragility and resiliency against malicious attacks;
Antifragility and the Cloud;
Service Level Agreements for Antifragility;
Verification and validation of resilience and antifragility;
Antifragile and resilient services.

ANTIFRAGILE is co-located with the 5th International Conference on Ambient Systems, Networks and Technologies, June 2 - 5, 2014, Hasselt, Belgium.

For more information please visit ANTIFRAGILE 2014.

Tuesday 12 November 2013

Lessons From the Past

I must confess that until a few time ago I didn't know the extinction of dinosaurs is not the only or the most severe of the extinction events our Earth has experienced. Citing from Wikipedia,

"the Cretaceous–Paleogene (K–Pg) extinction event [..] was a mass extinction of some three-quarters of plant and animal species on Earth—including all non-avian dinosaurs—that occurred over a geologically short period of time 66 million years ago".
[Wikipedia]

This certainly sounds quite bad, but in fact not as bad as the so-called Great Dying,

the "Permian–Triassic (P–Tr) extinction event [..] that occurred 252.28 million years ago. [..] It is the Earth's most severe known extinction event, with up to 96% of all marine species and 70% of terrestrial vertebrate species becoming extinct. It is the only known mass extinction of insects. Some 57% of all families and 83% of all genera became extinct."
[Wikipedia]

Thus some 252 million years ago a chain of events produced a catastrophe that affected so deeply the terrestrial ecosystem that it is conjectured "it took some 10 million years for Earth to recover" from it. Nevertheless, the Earth ultimately did recover from it, which led to so big a change in natural history that scientists had to clearly separate what was before from what followed, the Paleozoic ("Old Life") from the Mesozoic (the "Middle Life"). Among the many important questions that raise when considering so catastrophic an event, some that I feel are particularly relevant here are:

Q₁: Was there any "common reasons" behind the P–Tr extinction event? In other words—were there "common triggers" causing such a widespread correlated failure?
Q₂: What was the key ingredient—the key defensive strategies that is—that made it possible for the Earth to survive in spite of so harsh a blow?

Now in order to attempt an answer to the above question I recall the following facts:

F₁: "Mineralized skeletons confer protection against predators" [Knoll]
F₂: "Skeleton formation requires more than the ability to precipitate minerals; precipitation must be carried out in a controlled fashion in specific biological environments" [Knoll]
F₃: "The extinction primarily affected organisms with calcium carbonate skeletons, especially those reliant on ambient CO₂ levels to produce their skeletons" [Wikipedia].

In other words, one of nature's many independent evolutionary paths was particularly successful (F₁) and thus become widespread; regrettably, the adoption of the solution implies a strong dependence on predefined and stable environmental conditions (F₂); and, finally, a correlation exists between the class of species that adopted the solution and that of the species that were affected most by the P–Tr extinction event (F₃).

If we read the above with the lingo of computer dependability and resilience we could say that:

A given solution became widespread (for instance a memory technology, a software library, a programming language, an operating system, or a search engine).
The solution introduced a weakness: for instance, a dependence on a hidden assumption, or a "bug" depending on certain subtle and very rare environmental conditions.
This translated in a common trigger, a single-point-of-multiple-failures: one or a few events "turned on" the weakness and hit hard on all the systems that made use of the solution.

A good example of this phenomenon is probably given by the so-called Millennium Bug.

What can we conclude from the above facts and analogies? That solutions that work well in the "common case" are those that become more widespread. Regrettably this decreases disparity, namely inter-species diversity. Species that externally appear considerably different from each other in fact share a common trait -- a common design template. This means that whenever the "common case" is replaced by the very rare and very bad "Black Swan", a large portion of the ecosystem is jeopardized. In fact the rarest the exceptional condition, the more widespread is the template and the larger the share of species that will be affected. This provides some elements towards an answer for question Q₁: yes, there were common triggers that ultimately produced the P–Tr extinction event by increasing the diffusion of the same "recipes" thus paving the way to large amounts of correlated failures. On the other hand, the Earth did survive the Great Dying and other extinction events. Why? My guess for an answer to Q₂ is that Nature introduces systemic thresholds that make sure that disparity never goes beyond some minimum. The key ingredient to guarantee this is diversity: it is not by chance that mutation is an intrinsic method in genetic evolution. Mutation and possibly other mechanisms make sure that, at any point in time, not all of the species share the same design templates. In turn this guarantees that, at any point in time, not all the species share the same fate.

Interestingly enough, similar solutions are sought also when designing computer systems. In order to decrease the chance of correlated failures multiple diverse replicas are executed in parallel or one after the other. It's called design diversity and it's often based on design templates such as N-version programming or Recovery Blocks.

(It is worth remarking how the adoption of the design diversity templates also decreases disparity... yes, it's a never ending story.)

The major lesson we need to learn from all this is that diversity is an essential ingredient of resilience. Bring down diversity and you decrease the chance the ecosystem will be able to withstand the Black Swan when it will show up. (And, given enough time, rest assured it will show up). High diversity means that a large number of systems will be put to test with new conditions when the Big One strikes. Even when most of the system-environment fits will decree extinction (or system failure), still a few systems, by chance so to say, will have the right elements to pass through the sieves of the Black Swan with limited damage. And it's those limited few that are going to inherit the Earth.

Lessons from the Past by Vincenzo De Florio is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Permissions beyond the scope of this license may be available at http://win.uantwerpen.be/~vincenz/.

Saturday 9 November 2013

What system is the most resilient?

"What system is the most resilient": it probably sounds like a silly question, but before jumping to conclusions I invite you to consider a few facts. A human being is a "system" generally considered as the highest peak of evolution. Certainly human beings are more 'advanced' than, e.g., dogs. But how do the superior qualities of mankind translate in terms of resilience? Under stressful or turbulent conditions, often a wo/man will result "better" than a dog: superior awareness, consciousness, manual and technical dexterity, and reasoning; advanced ability to thesaurize experience, learn, develop science, and so on and so forth, they all lead to the 'obvious' conclusions that mankind has a greater ability to adapt and survive. And though, it's quite easy to find counterexamples. If a threat comes with ultrasonic noise, a dog may perceive the threat and react -- for instance by running away -- while the man may stay unaware until too late. Or consider the case of miners: inability to perceive toxic gases makes them vulnerable to, e.g. carbon monoxide and dioxide, methane, and other lethal gases. A simpler system able to perceive the threat would have more chances to survive.

From the above reasoning we can conclude a first fact: Resilience is not an absolute figure; it is the result of a match with a reference environment. Whatever its structure, organization, architecture, capabilities, and resources, a system is only resilient as long as its "provisions" match the current environmental conditions. A better way to express the resilience of a system, say s, could then be by saying that s is E-resilient, where E represents in some form the reference environment. (E could be expressed, e.g., as a vector of dynamic systems, each of which representing the evolution in time of a given context figure). I think it is important to realize that, with reference to an E-resilient system s, a key distinguishing factor is whether or not E is practically immutable and beyond any possibility of revision for s. "Practically" here means that the revisions should occur on a time scale comparable with that of the onset of change.

So surprise, surprise!, what do we end up with here? With yet another blow to our vaunted superiority and our fabled central role in the design of all things universal. Our homocentrism leads us to consider ourselves as Top of the Evolutionary Heap, but we must admit that this is yet another of man's misconceptions.

What I want to come to is that it would be more accurate to state that man is (likely to be) the most advanced natural system. The adjective is important, because all natural systems change their system-environment fit through natural evolution, which is an inefficient and very slow way to cope with change. Nature's answers to this problem include a change of scale -- moving from the individual to the species, from the species to the ecosystem, in a fractal organization of all things. Thus a role may disappear but the whole she-bang still goes on. But on an individual scale, and even at the level of species, our inability to quickly revise our system structure and organization leaves us as evolutionary sitting ducks: it takes literally ages to reshape our features, which often leads to resilience failures for the individual or the species.

Thus what system could be considered as more resilient than this? Most definitely that would be a system able to self-evolve very quickly; a system that is that may be better modeled as a E(t)-resilient system. Of course this does not immediately make a cyber-physical thing more resilient than a man; but its system structure is free of the limitations that burden natural systems. We must admit, I believe, that natural systems have reached an evolutionary dead-end. Even though not as sophisticated as a human being, a cyber-physical thing can in a matter of seconds assess large-scale situations, perform collective strategies, establish mutualistic relationships, self-organize into a swarm. Soon non-trivial forms of collective intelligence are likely to emerge. At that point, what system will be the most resilient?

What system is the most resilient? by Vincenzo De Florio is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Based on a work at http://eraclios.blogspot.be/2013/11/what-system-is-most-resilient.html.
Permissions beyond the scope of this license may be available at http://win.uantwerpen.be/~vincenz/.

Sunday 3 November 2013

Resilient and Antifragile Essences in Actor-Network Theory

Actor-Network Theory (ANT) is a complex social theory based on social constructivism and the central idea that essences (viz., individuals and societies) are to be interpreted not as “containers” characterized by a physical dimension, e.g., a surface or a sphere, but rather as networks of nodes that have as many dimensions as they have “ties” (i.e., connections). Such ties are “weak by themselves”, though they achieve robustness (“material resistance”) through their social nature: “Each tie, no matter how strong, is itself woven out of still weaker threads [..] Strength does not come from concentration, purity and unity, but from dissemination, heterogeneity and the careful plaiting of weak ties” [Latour, 1996]. “Strength” here refers to the ability of the “essences” to retain their identity in spite of environmental conditions affecting their ties and nodes. A fragile essence is one characterized by one or more points-of-diffusion-failures — as it is the case for instance in centralized and hierarchical organizations; conversely, an essence is robust (resilient) if it tolerates discontinuities and other information diffusion failures. Be it an individual or a society, an ANT essence is not a static, immutable entity: It “starts from irreducible, incommensurable, unconnected localities, which[..] sometimes end [up] into provisionally commensurable connections” [Latour, 1996]. Strength is sought by conserving identity despite the changes of scale that are necessary to counterbalance turbulent environmental conditions. The above mentioned “careful plaiting of weak ties” is meant to guarantee that a network “is the same” (cf. definition of resilience), though “stronger” (which leads to Taleb's Antifragility). In this paper I conjectured that a geometrical interpretation to the ANT concept of strength may be given by the structured addition of complexity that naturally emerges in Fractal Social Organizations.