Applying Architectural Tactics

The use of architectural tactics, as proposed by the Software Engineering Institute, provides a systematic way of dealing with a systems non-functional requirements (sometime referred to as the systems quality attributes or just qualities). These can be both runtime qualities such as performance, availability and security as well as non-runtime such as maintainability, portability and so on. In my experience, dealing with both functional and non-functional requirements, as well as capturing them using a suitable modeling tool is something that is not always handled very methodically. Here’s an approach that tries to enforce some architectural rigour using the Unified Modeling Language (UML) and any UML compliant modeling tool.

Architecturally, systems can be decomposed from an enterprise or system-wide view (i.e. meaning people, processes, data and IT systems), to an IT system view to a component view and finally to a sub-component view as shown going clock-wise in Figure 1. These diagrams show how an example hotel management system (something I’ve used before to illustrate some architectural principles) might eventually be decomposed into components and sub-components.

Figure 1: System Decomposition

This decomposition typically happens by considering what functionality needs to be associated with each of the system elements at different levels of decomposition. So, as shown in Figure 1 above, first we associate ‘large-grained’ functionality (e.g. we need a hotel system) at the system level and gradually break this down to finer and finer grained levels until we have attributed all functionality across all components (e.g. we need a user interface component that handles the customer management aspects of the system).

Crucially from the point of view of deployment of components we need to have decomposed the system to at least that of the sub-component level in Figure 1 so that we have a clear idea of each of the types of component (i.e. do they handle user input or manage data etc) and know how they collaborate with each other in satisfying use cases. There are a number of patterns which can be adopted for doing this. For example the model-view-controller pattern as shown in Figure 2 is a way of ascribing functionality to components in a standard way using rules for how these components collaborate. This pattern has been used for the sub-component view of Figure 1.

Figure 2: Model-View-Controller Pattern

So far we have shown how to decompose a system based on functional requirements and thinking about which components will realise those requirements. What about non-functional requirements though? Table 1 shows how non-functional requirements can be decomposed and assigned to architectural elements as they are identified. Initially non-functional requirements are stated at the whole system level but as we decompose into finer-grained architectural elements (AKA components) we can begin to think about how those elements support particular non-functional requirements also. In this way non-functional requirements get decomposed and associated with each level of system functionality. Non-functional requirements would ideally be assigned as attributes to each relevant component (preferably inside our chosen UML modelling tool) so they do not get lost or forgotten.

Table 1
System Element Non-Functional Requirement
Hotel System (i.e. including all actors and IT systems). The hotel system must allow customers to check-in 24 hours a day, 365 days a year. Note this is typically the accuracy non-functional requirements are stated at initially. Further analysis is usually needed to provide measurable values.
Hotel Management System (i.e. the hotel IT system). The hotel management system must allow the front-desk clerk to check-in a customer 24 hours a day, 365 days a year with a 99.99% availability value.
Customer Manager (i.e. a system element within the hotel’s IT system). The customer manager system element (component) must allow customer details to be created, read or updated (but not deleted) 24 hours a day, 365 days a year with a 99.99% availability value.
Customer Manager Interface (i.e. the user interface that belongs to the Customer Manager system element). The customer manager interface must allow customer details to be created, read or updated (but not deleted) 24 hours a day, 365 days a year with a 99.99% availability value.

Once it is understood what non-functional requirement each component needs to support we can apply the approach of architectural tactics proposed by the Software Engineering Institute (SEI) to determine how to handle those non-functional requirements.

An architectural tactic represents “codified knowledge” on how to satisfy non-functional requirements by applying one or more patterns or reasoning frameworks (for example queuing or scheduling theory) to the architecture. Tactics show how (the parameters of) a non-functional requirement (e.g. the required response time or availability) can be addressed through architectural decisions to achieve the desired capability.

In the example we are focusing on in Table 1 we need some tactics that allow the desired quality attribute of 99.99% availability (which corresponds to a downtime of 52 min, 34 sec per year) to be achieved by the customer manager interface. A detailed set of availability tactics can be found here but for the purposes of this example availability tactics can be categorized according to whether they address fault detection, recovery, or prevention. Here are some potential tactics for these:

  • Employing good software engineering practices for fault prevention such as code inspections, usability testing and so on to the design and implementation of the interface.
  • Deploying components on highly-available platforms which employ fault detection and recovery approaches such as system monitoring, active failover etc.
  • Developing a backup and recovery approach that allows the platform running the user interface to be replaced within the target availability times.

As this example shows, not all non-functional requirements can be realised suitably by a component alone; sometimes full-realisation can only be done when that component is placed (deployed) onto a suitable computer platform. Once we know what non-functional requirements need to be realised by what components we can then think about how to package these components together to be deployed onto the appropriate computer platform which supports those non-functional requirements (for example on a platform that will support 99.99% availability and so on). Figure 3 shows how this deployment can be modelled in UML adopting the Hot Standby Load Balancer pattern.

Figure 3: Deployment View

Here we have taken one component, the ‘Customer Manager’, and showed how it would be deployed with other components (a ‘Room Manager’ and a ‘Reservation Manager’’) that support the same non-functional requirements onto two application server nodes. A third UML element, an artefact, packages together like components via a UML «manifest» relationship. It is the artefact that actually gets placed onto the nodes. An artefact is a standard UML element that “embodies or manifests a number of model elements. The artefact owns the manifestations, each representing the utilization of a packageable element”.

So far all of this has been done at a logical level; that is there is no mention of technology. However moving from a logical level to a physical (technology dependent level) is a relatively simple step. The packaging notion of an artefact can equally be used for packaging physical components, for example in this case the three components shown in Figure 3 above could Enterprise Java components or .NET components.

This is a simple example to illustrate three main points:

  1. Architecting a system based on functional and non-functional requirements.
  2. Use of a standard notation (i.e. UML) and modelling tool.
  3. Adoption of tactics and patterns to show how a systems qualities can be achieved.

None of it rocket science but something you don’t see done much.

Advertisements

How Much Does Your Software Weigh, Mr Architect?

Three apparently unrelated events actually have a serendipitous connection which have led to the title of this weeks blog. First off, Norman Foster (he of the “Gherkin” and “Wobbly Bridge” fame) has had a film released about his life and work called How Much Does You Building Weigh, Mr Foster. As a result there have been a slew of articles about both Foster and the film including this one in the Financial Times. One of the things that comes across from both the interviews and the articles about Foster is the passion he has for his work. After all, if you are still working at 75 then you must like you job a little bit! One of the quotes that stands out for me is this one from the FT article:

The architect has no power, he is simply an advocate for the client. To be really effective as an architect or as a designer, you have to be a good listener.”

How true. Too often we sit down with clients and a jump in with solutions before we have really got to the bottom of what the problem is. It’s not just about listening to what the client says but also what she doesn’t say. Sometimes people only say what they think you want them to hear not what they really feel, So, it’s not just about listening but developing empathy with the person you are architecting for. Related to this is not closing down discussions too early before making sure everything has been said which brings me to the second event.

I’m currently reading Resonate by Nancy Duarte which is about how to put together presentations that really connect with your audience using techniques adopted by professional story tellers (like film makers for example). In Duarte’s book I came across the diagram below which Tim Brown also uses in his book Change by Design.

For me the architects sits above the dotted line in this picture ensuring as many choices as possible get made and then making decisions (or compromises) that are the right balance of the sometimes opposing “forces” of the requirements that come from multiple choices.

One of the big compromises that often needs to be made is how much can I deliver in the time I have available and, if its not everything, what is dropped? Unless the time can change then its usually the odd bit of functionality (good if these functions can be deferred to the next release) or quality (not good under any circumstances). This leads me to the third serendipitous event of the week: discovering “technical debt”.

Slightly embarrassingly I had not heard of the concept of technical technical debt before and it’s been around for a long time. It was originally proposed by Ward Cunningham in 1992 who said the following:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation.

Technical debt is a topic that has been taken up by the Software Engineering Institute (SEI) who are organising a workshop on the topic this year. One way of understanding technical debt is to see it as the gap between the current state of the system and what was originally envisaged by the architecture. Here, debt can be “measured” by the number of known defects and features that have not yet been implemented. Another aspect to debt however is the amount of entropy that has set in because the system has decayed over time (changes have been made that were not in line with the specified architecture). This is a more difficult thing to measure but has a definite cost in terms of ease of maintenance and general understandability of the system.

Which leads to the title of this weeks blog. Clearly software (being ‘soft’) carries no weight (the machines it runs on do not count) but nonetheless can have a huge, and potentially damaging weight in terms of the debt it may be carrying in unstructured, incorrect or hard to maintain code. Understanding the weight of this debt, and how to deal with it, should be part of the role of the architect. The weight of your software may not be measurable in terms of kilograms but it surely has a weight in terms of the “debt owed”.

Interprise Architecture and Ultra-Large-Scale Systems

In a previous post I introduced the term “Interprise Architecture” to describe how the internet is breaking down the traditional boundaries of the enterprise and thus requires a new approach to Enterprise Architecture that’s not just about describing what’s inside the enterprise but also what’s on the outside. No longer can Enterprise Architects create blueprints for some future state that the enterprise will one day reach with roadmaps for how that state will be achieved. There are too many disruptive influences and new technologies that are impinging on the enterprise that will not only mean the roadmap is sending you in the wrong direction but that you are probably using the wrong mode of transport to get there as well.

I received a few comments on this from folk at the Software Engineering Institute (SEI)as well as Gartner. The work on Ultra-Large-Scale (ULS) Systems from the SEI particularly drew my attention and resonates nicely with some of my own thoughts. Here are some of the key ideas from the SEI report Ultra-Large-Scale Systems – The Software Challenge of the Future plus some additional musings of my own on what constitutes Interprise Architecture. First, ULS:

  • The SEI report on ULS systems was funded by the United States Department of Defence (DoD) which asked the SEI to consider future systems that could not only contain of billions of lines of code but also exhibit some, possibly all, of the following characteristics: decentralisation; conflicting, unknowable, and diverse requirements; continuous evolution and deployment; heterogeneous and changing elements; erosion of the people/system boundary; and normal failures of parts of the system.
  • ULS systems are likely to mean that traditional software and systems engineering approaches will no longer be adequate or can be the primary means by which such systems are designed (architected) or built.
  • ULS systems can be compared with cities whereas traditional systems can be compared with buildings. Buildings can be designed and built to a blueprint whereas cities emerge and are continuously adapting over time.
  • ULS systems are comprised of a dynamic community of interdependent and competing organisms (in this case, people, computing devices, and organizations) in a complex and changing environment. These are referred to as socio-technical ecosystems.
  • ULS systems are ones that are continuously evolving with new behaviour constantly emerging. In this respect they have the attributes of wicked problems where the problem is never definitively solved (or indeed understood).
  • ULS systems expect failure to be the norm and that unusual situations and boundary conditions will occur often enough that something will always be failing.

The SEI report is primarily aimed at allowing the US military to develop new systems however I believe the key ideas that challenge the development of such systems also have wide applicability in business systems, the sort I’m most interested in. I see that what I have characterised as Interprise Architecture could therefore be applied to developing ULS business systems. Here are three examples of ULS business systems that might benefit from an Interprise Architecture approach:

  • Electronic Trading Systems. These are systems that trade securities (such as stocks, and bonds), foreign currency, and exchange traded derivatives electronically. They use IT to bring together buyers and sellers through electronic media and create a virtual market place. Such systems are typically built using proprietary software that has grown and evolved over many years. Investment banks have extremely complex technology requirements, as they have to interface with multiple exchanges, brokers and multi-dealer platforms, as well as their own pricing, profit and loss (P&L), trade processing and position-keeping systems. The challenge here then is not only the large numbers of systems but also the increasing complicated regulatory environment.
  • Electricity Generation and Metering. The generation and consumption of electricity faces a number of unique challenges including improved and more efficient use of green technologies as well as smart metering. Traditional electrical meters only measure total consumption and as such, provide no information of when the energy was consumed. Smart meters provide an economical way of measuring this information, allowing price setting agencies to introduce different prices for consumption based on the time of day and the season.
  • Retail Systems. As retailers look for ever more cunning ways to get consumers to part with their hard-earned cash, traditional (i.e. high street) and electronic retail will merge more and more. For example not only can I use my 3G enabled smart-phone from the store I happen to be in to quickly compare prices in other stores in the area, the store itself can potentially detect I am shopping there using location based services and make me an enticing offer to shop there.

So here are the seven challenges that I see Interprise Architecture must deal with in developing a ULS business system:

  1. Requirements are unknowable. Sometimes the very act of capturing a requirement (in whatever form) changes the nature of that requirement. Interprise Architecture must not only allow for continuously changing requirements but must also recognise that some uses of the system cannot be known up-front; hence the need to build more adaptable systems.
  2. The boundary between people and systems is at best blurred and at worst never established. Sometimes people will be users of the system, sometimes they (or at least the devices they own) will be part of the system.
  3. Development never stops but is a continuous cycle. Development processes as well as the projects that deliver such systems must therefore support this never-ending cycle.
  4. Systems continuously adapt and exhibit emergent behaviour. As new uses of the system are “discovered’ by users the components that make up the system need to be able to adapt to satisfy those new behaviours.
  5. Failure (of parts of the system) is inevitable. Just like a fire in a building in a city can be localised and extinguished without by and large affecting the whole of a city then so to must Interprise Architecture allow for partial failure and reconfiguration of some components.
  6. Development tools and languages need to take account of the unpredictable and maybe even unspecifiable aspects of systems development. Traditional development tools favour left-brain thinkers where logic and reasoning can be applied to develop systems that move from abstract ideas to physical implementations (from models to code if you like). New tools for developing and describing Interprise Architectures need to be able to inject a bit of right-brain thinking by allowing creative multi-disciplinary elements to be added.
  7. Governance needs to be de-centralised. Strong, top-down governance (the sort favoured by Enterprise Architects) cannot work in a system where all the parts may not even be known. Interprise Architecture needs to recognise that some components are outside its control or immediate sphere of influence and have policies in place that allow new components to be added which don’t harm or damage the whole system.

As an interesting post-script to this the Financial Times recently published an article on Facebook and the plans that CEO Mark Zuckerberg has for advancing his brainchild. Zuckerberg had just announced a new feature on Facebook called Deals which allows smartphone users who have downloaded the Facebook application to check in at a physical location such as a coffee shop and get a reward. Zuckerberg says:

If you look five years out, every industry is going to be rethought in a social way. You can remake whole industries. That’s the big thing.

Facebook is one example of how external applications that allow users to impinge on the enterprise are changing how Enterprise Architects must think.

Next, a story for what a ULS business system might look like and how it might work.

When Systems Fail

This week I was a direct victim of a systems failure that set me thinking about how even mundane activities that we have been doing for several tens, if not hundreds, of years, like checking into a hotel  in this case, rely on systems that we take for granted and which, when they fail, throw everything into complete chaos.It’s a long and not particularly interesting story but in summary I checked into one of the large chain  hotels, which I use a lot, only to find when I opened my room door that the room was in a state of complete chaos and had clearly not been visited by housekeeping that day. On trying to change to another room I was told the system had been down since 4am (it was now 8pm) that morning and the staff could not tell what state rooms were in. Clearly not a great state of affairs and not great for client relations (there were a lot of grumpy people queueing in reception, some of whom I would guess would not be going back to that hotel). So what would an architect of such a system do to mitigate against such a system failure?

  1. I don’t profess to know too much about how hotel management systems work and whether they are provided centrally or locally however I would have thought one of the basic non-functional characteristics of such systems would have been a less than one hour recovery following a system failure (not 16 hours and counting). Learning point: Clarify your availability non-functional requirements (NFRs) and realise them in the relevant parts of the system. Maybe not all components need to be highly available (checking in a customer maybe more important then checking her out for example) but those that are need to be suitable ‘placed’ on high-availability platforms.
  2. There was a clear and apparent need for a disaster recovery plan that involved more than the staff apologising to customers. Learning point: Have a disaster recovery policy and test it regularly.
  3. A system is about more than just the technology; the people that use the system are a part of it as well. Learning point: The architecture of the system should include how the people that use that system interact with it during both normal and abnormal operating conditions.
  4. Often NFRs are not quantified in terms of their business value (or cost). When a problem occurs is the impact to the business (in terms of lost revenue, irate customers who won’t come back etc) really understood? Learning point: Risk associated with not meeting NFRs needs to be quantified so the right amount of engineering can be deployed to address problems that may occur when NFRs are not met.

Formal approaches to handling non-functional requirements in a systems architecture are a little thin on the ground. One approach suggested by the Software Engineering Institute is through the use of architectural tactics. An architectural tactic is a way of satisfying a desired system quality (such as performance) by considering the parameters involved (for example desired execution time), applying one or more standard approaches or patterns (such as scheduling theory or queuing theory) to address potential combinations of parameters and arriving at a reasoned (as opposed to random) architectural decision. Put another way an architectural tactic is a way of putting a bit of “science” behind the sometime arbitrary approach to making architectural decisions around satisfying non-functional requirements.

I think this is a field that is ripe for more work with some practical examples required. Maybe a future hotel management system that adopts such an approach to during its development will allow a smoother check-in process as well.