
Navigating the AI Cybersecurity Threat Landscape: Risks, Attack Paths & How to Prepare
Artificial Intelligence is revolutionizing industries, driving innovation, and transforming the way we live and work—but with this innovation comes new…
Generative AI is being applied to many areas of our lives, both personally and professionally. Software architecture is no exception. In this article you’ll learn about the possibilities of how generative AI can and might be used within the field of software architecture in areas of architectural analysis, governance, augmented behavior, and AI-driven architecture.
(Note: This article is a rewrite of a prior article published in March 2024.)
In the tech industry software developers have gotten quite comfortable with the use of generative AI for code generation using tools such as Copilot and as a useful assistant for general software development tasks. But what about software architecture? Can generative AI help a software architect with architectural tasks such as risk analysis, identifying possible bottlenecks, or even finding antipatterns within an architecture? This article explores the use of generative AI within software architecture, including topics surrounding AI as an assistant to a software architecture, the use of AI for architectural governance, architecture and the AI ecosystem, and finally AI-driven software architecture.
Given the context of this article, before diving into the use of AI within software architecture it’s useful to understand exactly what software architecture is. Software architecture is all about the structure of a software system or product, much in the same way an architecture describes the structure of a building. For example, the architecture of an office building has a particular shape, external and internal walls, multiple floors, a roof, and so on. How you decorate each office or meeting room doesn’t impact the structure of the building—that would be considered design.
Software architecture also describes the characteristics a system must support, orthogonal to the functionality of the system of product. For example, does the system need to scale to a certain number of concurrent users? Does it require a certain level of responsiveness or performance? Does it need to be available all the time? These characteristics are the foundational aspects of a system’s architecture and define a system’s capabilities. The functionality of a system, on the other hand, defines a system’s behavior.
Ignoring architecture in a software system would be like ignoring architecture in an office building. Imagine a group of bricklayers, electricians, plumbers, and framers all getting together and begin constructing a 10-story office building from scratch without an architecture in place. As you might suspect, this would most likely end in disaster. Unfortunately, this is what happens with many software systems and products; developers just start coding the functionality, ignoring the overall structure of the system. The functionality may work perfectly, but it might not be able to scale over 5 users, the response time might be so bad the system becomes unusable, and the system might become near impossible to maintain, test, and deploy. Those factors are all about architecture, which is why it’s a critical part of a software system or product.
The first compelling question that comes to mind regarding generative AI and architecture is “Can it help me perform my tasks as a software architect?”. There are many areas that would be useful for generative AI to assist with, including the following:
· Risk assessment: “Are there risk areas within the architecture?”
· Risk mitigation: “How should I address the risk?”
· Anti-Patterns: “Are there any common anti-patterns in the architecture?”
· Decisions: “Should I use orchestration or choreography for the workflow?”
For an LLM to understand, learn, and reason about an architectural solution, the architecture must first be described in detail in the form of prompts. You could write a lot of prose describing your architecture in a prompt, something like “Suppose the architecture consists of 10 independently deployed services, with each owning their own data. Here’s what each of the services do and how they interact with each other…”. However, this sort of prompt is completely unstructured, would take forever to write, and would be difficult to describe ongoing changes to the architecture.
Fortunately, popular software architecture modeling and diagramming tools such as ArchiMate, Structurizr, and PlantUML allow you to export your architecture diagram and corresponding structural details into machine-readable format, allowing you to use that export in a prompt to have an LLM understand your architecture. Tools such as Haiven even allow you to upload your architecture diagram directly into an LLM without having to export it.
Using these tools and techniques, generative AI does a fairly good job at being able to injest your architecture and gain a basic understanding of it. This allows you to ask simple questions about the structure of your system, such as “Describe my architecture”, “Tell me what each service does and how it interacts with other services”, and so on.
While it’s easy to have an LLM understand and learn about a software architecture, we’re unfortunately a long way off from having an LLM reason about the architecture to make the hard decisions architects face. The main reason this is still a roadblock for generative AI as an architect assistant is that everything in software architecture is a tradeoff. Analyzing tradeoffs in architecture and finding the most appropriate decision requires deep knowledge of the business context, business environment, team topologies, technical environment, infrastructure, engineering practices, architectural characteristics, enterprise-related practices and procedures, data topologies, and so on.
This is an incredible amount of ever-changing context to continuously feed into an LLM via prompts, making this dimension of software architecture and generative AI not overly feasible now.
That said, there are many other exciting areas where generative AI and architecture do intersect nicely, namely in areas of architectural governance, the generative AI ecosystem, and AI-driven software architecture. The rest of this article talks about these use cases for generative AI.
Architectural governance involves leveraging what are called fitness functions to ensure that the architecture is implemented correctly, is aligned with all of the intersections of architecture, and satisfies business capabilities such as responsiveness, availability, scalability, fault tolerance, agility, and so on. Fitness functions are executable tests a software architect writes that validate and measure an architecture based on architectural characteristics and other critical factors. These tests and corresponding analysis and measurements are sometimes performed by tools such as DataDog, but many times are custom written by software architects using tools like ArchUnit, ArchUnitNET, NetArchTest, PyTestArch, and TSArch.
Generative AI can be used to continuously analyze measurements and data from observability, and either notify someone if a negative trend is observed or a particular threshold reached, make recommendations about the negative results, or even take corrective action if a negative trend or problem is observed. For example, an AI-driven fitness function might analyze a particular system or service for responsiveness, identify the root cause of the problem if responsiveness degrades (such as database connection waits), and automatically take corrective action to resolve the problem (such as dynamically increase the connection thread pool).
Moreover, generative AI can also be used to generate governance-related structural and constraints fitness functions. Using a lightweight architecture definition language (ADL), a software architect can describe the architecture using pseudo-code, then use that ADL as a prompt to have an LLM generate structural fitness functions.
To illustrate this valuable use of generative AI, consider the following ADL which describes the specific domains within an electronic recycling system:
DESCRIPTION Define Domains
DEFINE SYSTEM Electronic Recycling System AS com.ers
DEFINE DOMAIN Item Quoting AS quoting
DEFINE DOMAIN Receiving AS receiving
DEFINE DOMAIN Item Assessment AS assessment
DEFINE DOMAIN Accounting AS accounting
DEFINE DOMAIN Recycling AS recycling
ASSERT(CLASSES are only CONTAINED within SUBDOMAINS within DOMAINS)
Recall from my prior article that a domain in the architecture is identified through the higher-level nodes of a directory or namespace. Therefore, the above ADL specifies that all code should reside in subdomains and components (leaf directories) that reside in higher-level directories named quoting, receiving, assessment, accounting, and recycling. An architect can enforce this constraint through executable source code by feeding the ADL into an LLM prompt and specifying which specific tool and language to generate the code in:
PROMPT: Based on this pseudo-code, write a ArchUnit test in Java
Doing this yields the following ArchUnit code in Java, which can now continuously run to ensure the alignment between the architecture and the corresponding implementation.
public class ArchitectureTest {
private static final String BASE_PACKAGE = “com.ers”;
private static final String QUOTING_PACKAGE
= BASE_PACKAGE + “.quoting..”;
private static final String RECEIVING_PACKAGE
= BASE_PACKAGE + “.receiving..”;
private static final String ASSESSMENT_PACKAGE
= BASE_PACKAGE + “.assessment..”;
private static final String ACCOUNTING_PACKAGE
= BASE_PACKAGE + “.accounting..”;
private static final String RECYCLING_PACKAGE
= BASE_PACKAGE + “.recycling..”;
@Test
public void defineDomains() {
JavaClasses importedClasses = new
ClassFileImporter().importPackages(BASE_PACKAGE);
// Define the rule to ensure that classes are only
// contained within their subdomains
ArchRule domainContainmentRule =
ArchRuleDefinition.classes()
.that().resideInAPackage(BASE_PACKAGE + “..”)
.should().resideInAnyPackage(QUOTING_PACKAGE,
RECEIVING_PACKAGE, ASSESSMENT_PACKAGE,
ACCOUNTING_PACKAGE, RECYCLING_PACKAGE);
domainContainmentRule.check(importedClasses);
}
}
.
The use of generative AI to generate governance fitness functions (rather than having a software architect write the fitness function code from scratch) has numerous benefits. First, it separates the architecture (described in the ADL) from the implementation platform, allowing for polyglot implementations and language-agnostic architecture descriptions. Second, it allows for more effective and easier change—as the architecture changes (such as adding another domain or component), the software architect only needs to change the ADL and regenerate that specific fitness function through an LLM prompt. Third, it is much easier to understand and maintain the architecture through a lightweight and concise ADL than it is to read through pages of ArchUnit or NetArchTest source code.
Incorporating AI-based behavior into a system is a rapidly increasing trend in the industry. Examples such as using AI for anonymizing resumes and creating story profiles for HR systems and grading of short answer and essay questions for testing systems are just a few of the dozens of examples of AI-based behavior seeping into systems. Not surprisingly, incorporating AI-based behavior into systems has an impact on the corresponding architecture of that system. Specifically, the two areas software architects should be concerned and knowledgeable about are Guardrails, and Evaluations.
Guardrails
Guardrails (sometimes referred to as “rails”) are specific ways of controlling the outputs of an LLM. Examples include removing political context from a response, following a pre-defined dialog path, using a particular language style, and so on. NeMo Guardrails from NVIDIA is an example of one such popular guardrails product.
Guardrails consists of 5 main architectural components: input rails, retrievable rails, dialog rails, execution rails, and finally output rails.
Input rails are applied to the input from the user. An input rail can reject the input, stopping any additional processing, or alter the input (such as masking potentially sensitive data, rephrasing the prompt, and so on).
Retrievable rails are applied to the retrieved chunks in the case of a RAG (Retrieval Augmented Generation) scenario. A retrieval rail can reject a RAG chunk, preventing it from being used to prompt the LLM, or alter the relevant chunks (again, to mask potentially sensitive data, exclude data, and so on).
Dialog rails influence how the LLM is prompted. These rails operate on canonical form messages and determine if an action should be executed, if the LLM should be invoked to generate the next step or a response, or if a predefined response should be used instead.
Execution rails are applied to both input and output of a prompt and specify what custom actions need to be called by the LLM, like the way plug-ins are used within many products. Custom actions may be implemented through domain-specific tools, custom tools, and commercial tools.
Finally, output rails are applied to the output generated by the LLM. An output rail can reject the output, preventing it from being returned to the user, or alter it (such as removing sensitive, personal, or irrelevant data).
Evaluations (commonly referred to as “evals”) involves measuring and assessing a model’s performance across key tasks. This process uses various metrics to determine how well the model predicts or generates text, understands context, summarizes data, and responds to queries. Evals are essentially fitness functions for an LLM.
Some of the standard metrics used within most eval products (such as Ragas and Deepeval) include context precision, context recall, context entities recall, noise sensitivity, response relevancy, and topic adherence. However, from an architecture standpoint the use of evals goes beyond these metrics to include other types of evaluations, including LLM output comparisons and parallel human validation.
For example, in the AI-based short answer and essay grading use case, an evaluation might include sending random short answer or essay responses to both an LLM and a human. Both the LLM and the human grader grade the question, and the results compared to determine the effectiveness or accuracy of the AI-based grading process.
Evaluations can also include sending a prompt to multiple parallel LLMs, and then compare the results, similar in the way most consensus-based algorithms work within the data synchronization and infrastructure nodes.
Regardless of the categories of products used for AI-based behavior, with respect to the AI ecosystem, a software architect should view the LLM itself as a “black box” and focus rather on the surrounding AI ecosystem from an architectural standpoint.
Back in the late 90’s IBM (as well as a host of other companies) conducted research on what are known as autonomic systems. Autonomic systems are self-managing systems that can automatically self-adapt to unpredictable changes while hiding complexities away from operators or users. Every autonomic system must be able to exhibit the following set of properties, all without human intervention:
Self-Aware: The ability of the system to monitor and assess its state.
Self-Configuring: The ability of the system to change its configuration.
Self-Healing: The automatic discovery and correction of faults.
Self-Protecting: The proactive protection from arbitrary attacks.
You might remember the 1968 film 2001: A Space Odyssey, and one of its main characters, HAL (Heuristically Programmed Algorithmic Computer). The HAL 9000 was the intelligent supercomputer that ran Discovery One, the ship headed for Jupiter to investigate the famous monolith. HAL was, in fact, an autonomic system that exhibited all the properties previously listed. Simply put, HAL was an AI-driven system. We can add this kind of intelligence into software architectures, therefore creating business systems and products that exhibit some or all the above properties, creating what is known as AI-Driven Software Architecture.
Several architectural patterns already exist for creating AI-Driven Software Architecture. Leveraging basic observability within a system using either tools or custom code as fitness functions provide an AI-driven architecture with the necessary data to analyze its environment and current situation, providing a self-awareness foundation for implementing the other properties autonomic systems support.
The ability of a system to self-configure itself leverages the capability of self-awareness using AI and ML to understand its environment, perform measurements, and dynamically update configuration settings through data-driven configuration mechanisms using a custom configuration service or configuration products such as Zookeeper. Examples of self-configuration values a system can analyze and adjust include thread pools, database connection pools, timeout values, the number of starting instances of a service, and so on. Another example of self-configuration is the Supervisor Consumer Pattern, which allows a system to observe concurrent load and intelligently spin up or tear down consumers within a service to provide more effective and intelligent scalability.
Architectural patterns such as the Producer Control Flow and the Workflow Event Pattern allow a system to observe external conditions, leveraging AI and ML to take protective action to avoid or recover from failures. These patterns are good examples of providing an intelligent and automated self-healing capability to systems pr products.
Finally, leveraging AI to observe and learn about attacks allows a system to better understand how attacks are coming in, and therefore self-configure itself to protect from further attacks.
Look for more patterns and movement in this evolving area of architecture in the coming years as we learn more about the capabilities of generative AI and ML to provide autonomic capabilities to business systems and products.
We are still a long way from having generative AI help software architects analyze and validate architectures. However, we are already there in terms of leveraging generative AI to help provide automated governance and get a good start on creating AI-driven software architecture and autonomic systems. Understanding the AI ecosystem and how it relates to software architecture is no longer an interesting side experiment, but a necessary aspect of software architecture and a necessary skill for a software architect.