PASSME MUsTeR
While there are plenty of non-functional requirements (take a look at ISO 25010 for example), 6 of them are the key ones that we need to prioritize when designing solutions. A great
mnemonic to remember
those is PASSME. I first encountered it at jfdeclerq.biz and it served me very well. PASSME stands for Performance, Availability, Scalability, Security, Maintainability, Extensibility.
Prioritization of those NFRs makes sense. Very few users are happy to use slow software, even fewer are happy when it's inaccessible, especially if starts failing as demand grows. We want our data and services secure. Lastly with the initial development costs comprising only 20-40% of the total cost of the software over its life-cycle, it is critical to provide well-written and extensible solution.
PASSME can be extended another acronym MUsTeR: Manageability, Usability, Testability and Reliability.
Collectively these ten NFRs should be good enough to cover most of the use cases, but usually the choice of specific NFRs depends on the project, as each additional NFR adds complexity to the design and structural support. So it's critical to pick the fewest characteristics that fulfill the purpose, as the best architecture might be very hard to achieve, and often we need to satisfice in order to produce a working architecture.
While it's unfeasible to provide a detailed description for each of the NFRs, we can at least briefly review what each of them is all about.
Performance
Performance refers to the ability of the system to meet timing requirements. It's often linked to scalability or increasing system capacity for work. Performance can be measured as throughput or latency. Throughput would be the number of transactions per unit of time, such as bit per second, or transactions per second. Latency is the delay between receiving input and producing output. Other metrics might include deadlines in processing or number of dropped events.
To satisfy performance requirements, there are two key approaches: we either control resource demand or manage available resources. Demand is controlled by managing sampling rate, limiting event response or event prioritization. Resource management can be done by adding new resources, introducing concurrency and replicating either data or computation.
Availability
Availability is all about minimizing service outages by mitigating faults. The system should be able to to mask or repair faults, so that outage does not exceed an agreed time interval. Availability is closely related to security. For example, a denial-of-service attack would clearly impact system availability. Another NFR that impacts availability is performance. It is often hard to tell whether the system is failing or is just super slow.
When speaking about availability, two important items are failures and faults. Failures are downtime events observed by users. Faults are the causes of those downtime events (failures). To fix a failure, we need to address the fault, which leads us to the measure of availability as the ratio:
This is where we get all those 9's % of availability 99%, 99.9%, 99.99%, etc, which translates in downtime 3 days, 8 hours, 52 minutes per year. To minimize faults, and thus increase system availability, the focus can be placed on the following areas:
- fault prevention - removal of faulty parts of the systems from service, using transactions, predicting faults
- fault detection - using ping/echo, monitoring, heartbeat, timestamps, etc.
- recovery from a fault - this is where redundancy kicks in - hot/warm/cold replicas, exception handling, and rollback and so on
Scalability
Scalability is the ability of the system to support required quality of service as the load increases without changing the system. A scalable system will respond within acceptable limits as there is more load.
We usually distinguish between vertical scalability (scaling up) which is implemented by adding more resources to the physical unit (more RAM and faster CPU, etc) and horizontal scalability (scaling out) by adding more resources to the logical unit (such as new server to an existing cluster). In a cloud environment, horizontal scalability is referred to as elasticity.
Security
Security is the measure of the system to protect data and information from unauthorized access while still providing access to people and systems that are authorized. The most simple approach to security involves three characteristics:
- Confidentiality - how well data and services are protected from unauthorized access
- Integrity - how well data and services are protected from the unauthorized manipulation
- Availability - ensuring data and services are available for the legitimate use
These characteristics are supported through
- Authentication - verification of the identities of the parties to a translation and checks to ensure that they are who they claim to be
- Non-repudiation - guarantees that the sender of the message can not later deny that they sent the message
- Authorization - granting user privileges to perform a task
We ensure a system is secure through attack detection, resistance, reaction and recovery.
Maintainability
Maintainability is the ability to correct flaws in the existing functionality without impacting other components of the system. This can be accomplished through the proper system development practices, such as adherence to SOLID principles, low coupling/high cohesion, ensuring proper unit size and test coverage.
Extensibility
Extensibility is often related to maintainability and speaks about system's characteristic to serve future business needs. In other words, how hard it would be to implement additional system features without impacting the existing system functionality. This is usually done through proper system development practices: layering, interfaces, dependency minimization, encapsulation and regression testing.
It almost makes sense to join maintainability and extensibility together under a topic of modifiability, but then PASSME will not sound as nice.
Manageability
Manageability is the ability to manage the system to ensure continuous health of the system with respect to all other non functional requirements. It deals with system monitoring and the ability to change the system configuration in order to improve the system without code change.
Usability
Usability is concerned with how easy it is for the end user to accomplish a desired task. A system can have adequate functionality, but inadequate usability because it is too difficult to use. This metric is somewhat subjective, as it depends on the end-user. Usability is addressed by enabling users to take the initiative for example, cancelling a long-running command, pausing a current process or undoing an action.
Testability
Testability is the ease with which system can demonstrate its faults through testing. In other words, it's the probability with which software will fail on its next execution. For a system to be testable, it should be possible to control each component's input and internal state and then to observe its outputs.
Reliability
Reliability ensures the integrity and consistency of the application and all its transactions. Reliability is closely connected with scalability though it can have a negative effect on it. A reliable system would continue to process requests and handle transactions accurately as the load increases. If the system cannot maintain the reliability as the load increases, the system is not scalable, so for a system to truly scale it must be reliable.
NFRs is a large topic and there is no shortage of various NFR lists and resources. However, there would not be a universal no list, as NFR prioritization ultimately depends on the use cases and stakeholders. PASSME MUsTeR might be a starting point to begin a meaningful system design conversation.
To continue learning more about NFRs and architecture, the following books can be helpful: