This post is part of the technical series introducing Azara.
If you are keen to try out the dev features of Azara, you can sign up for the SDK release here
Welcome back to this installment of the technical Azara.ai series. In the last post, we covered overall architecture, and today, we have a much longer and more detailed post, which will cover the composable architecture which lies at the heart of the platform.
Azara.ai's composable plugin architecture is designed for scalability, enabling seamless integration with third-party systems without overloading the core server. As LLM systems expand, this architecture prevents complexity from spiraling out of control by partitioning code through plugins. This approach balances the robustness of monoliths with the flexibility of microservices, allowing dynamic feature updates without sacrificing stability. Ideal for developers aiming to maintain a scalable, high-performance AI platform while rapidly deploying new integrations.
Channels
Channels are specialized integration plugins.
They include additional methods implemented on a subset of the integration plugins, e.g. Whatsapp, GMail, WebWidget (WW), Slack, etc. which provides a webhook to allow the plugin to be a conversation channel to an agent in the chatroom.
This allows plugins to be dual purpose wrt LLM tools
As a conversation channel - e.g. Whatsapp messages being received, processed by the LLM, and sending a response.
As a datasource / action for the LLM to probe or alter its environment, e.g. get Whatsapp group history (and summarize), or send a message to a whatsapp group.
Agentic Scenarios and Workflows
One of the core features of Azara is the ability to create workflows: both manually using a workflow builder, as well as generative workflows using a natural language workflow generator (which is implemented as an agentic scenario - see below).
Bad requirements and Human Centered AI
One of the critical learnings in my experience of a few decades of building technology, is that it’s pretty rare that technology is the issue. For example, though Generative AI is rather difficult - especially when it comes to consistency, we can workaround many of the issues using agentic flows etc.
A lot of the “It doesn’t work” is because requirements are poorly dictated to the LLM. There’s a lot of nuance and real world interaction that actually happens between people that isn’t captured when trying to give commands to an AI system.
Asking your Agent to “create an HR onboarding workflow” is not necessarily going to result in the correct output - it will be generic, miss obvious (to the user) steps, and not be tailored to their expectations.
None of this is the Agent’s fault - Nor is it the users.
However, it is up to us as developers to understand that users who are not technical are not used to having to give detailed requirements, especially before even getting first results back.
If you want to have successful implementations of Agents working with humans, you need to be thinking in a Human Centered AI approach.
If you require the user to fully specify the requirements before starting, then simply the inertia of all that without any guidance, will result in them abandoning your agent and application. See the 1% rule and fear of change Note below.
Users don’t know how to give requirements, but they excel at giving comparisons if shown something.. We are all familiar with the “Uber for XYZ”, or “Facebook but in pink”. So we need to coax the requirements from the user in a gentle manner.
For our Generate Workflow scenario, we do this by asking Clarifying Questions in a structured manner.
See the flow chart in the next section for how we extract requirements.
Actually we do a more detailed version of this graph
Generate a high level plan (chatGPT stops here generally) - verify with user
Map plugins to each task in the plan - this makes each task actionable - verify with user
For each plugin, verify the inputs are mapped correctly from the original request, ask the user for any missing required information
Finally generate the workflow, and present the Authentication / Credentials required for the workflow to run, to the user.
The 1% rule and its impact on successful Agentic Flows
A meta study https://en.wikipedia.org/wiki/1%25_rule on Wikipedia which analyzed who was contributing to the encyclopedia, showed the following chart
What makes this very interesting is the large amount of consensus from multiple other domains and studies which validate this ratio of creators / contributors to lurkers.
Fear of change
I first came across this article just over a decade ago, when I was running interviews with all my customers to determine why Devops, Cloud etc wasn’t sticking even when there were clear benefits for the users. I had collected almost identical numbers in terms of supporters, and detractors.
Most people are generally highly resistant to change, even good change when there is a generally high risk environment, e.g. recessions, job retrenchments, Covid, etc.
In my experiments to overcome the above, I’ve only been successful when I stopped trying to convince the 90%. They aren’t in the right mind space (for many reasons, some good).
Everyone in tech (Cloud, Devops, SRE, AI) is selling to the 1%, but that's a very small attack surface to get traction, especially as those people are unevenly distributed in an organization's hierarchy.
The right method is to focus on the 9%. By going back to the definition of their name, we can see how to move forward - we need to give them something similar to what they want to - that they can build on.
Bootstrapping in this way has proven to be an excellent manner of executing any project.
In terms of Agentic Scenarios and Agents - we do exactly this: We understand the psychology of the user, and we gently ask clarifying questions so that they can build a complex workflow without being overwhelmed.
From a sales perspective, the following book covers this topic really well.
In the next section, we will cover more on scenarios, which are how we extend basic functionality of our agents, other than plugins and channels which deal with information retrieval or taking action in 3rd party services.
Agentic Scenarios
Whereas plugins and channels allow us to extend the environment of the Azara.ai framework to 3rd party services, they don’t change the basic behaviour of agents wrt RAG or function calling (or workflows).
Agentic scenarios are typically langgraph notebooks, and are triggered by keywords or phrases, or direct user choice. They temporarily suspend the Conversational prompts of the agent, and apply specific business logic for the use case they are working on, e.g. Create a workflow implements the following state graph: (simplified)
Agentic RAG
A very common use case for scenarios that we expect to see continued development in by Academia and Industry is around variations of RAG. Some excellent examples of these can be found in the Langgraph examples folder on github https://github.com/langchain-ai/langgraph/tree/main/examples/rag
The core query pipeline for an agent is fairly simple. It implements simple RAG, similar to the diagram below.
Receive a query message from user via a channel (WebUI chatroom, WW, slack, email, etc).
Check if this query relates to any documents in the agents collection
Check if this query relates to an enabled plugin tool
If a document related question, respond with sources back to the agent.
If tool related, then call the appropriate function on that tool and return the response to the agent
Else check for special commands, Scenario trigger keywords or phrases e.g. for the Create Workflow scenario “Create workflow” - which loads the appropriate agentic scenario and executes that logic via the same chat channel until it’s scenario terminal state is reached.
Use the agent prompts, objectives, tone and examples to generate a response to the customer via the same channel.
Instead of the simple query - generate response of this simple RAG, we often need to perform checks for hallucinations, relevant result checking, or multi-hop queries.
Agentic RAG (and all scenarios) are designed to be loaded as a single notebook which runs in a secure e2b sandbox, and thus easily adaptable from code examples such as the above, or from academic papers.
Some example of agentic RAG from the langgraph github repo https://github.com/langchain-ai/langgraph/tree/main/examples/rag are show graphically below:
Agentic RAG
Adaptive RAG
Self-RAG
Langgraph
We tend to implement our agentic rag and similar flows using langgraph. As an example the self-rag is implemented in langgraph as per the flow chart diagram below:
Azara.ai Scenarios are agentic flows with starting conditions, e.g. keywords, phrases such as “create workflow”, or “hr onboarding”.
As scenarios are implemented as plugins, they are dynamically loadable at runtime, so it’s easily possible to update an existing workflow or agent at runtime without disrupting any other parts of the system.
This allows for rapid iteration of development as well.
Implementing a plugin
A plugin is a single python file which implements a class derived from BasePlugin.
It is loaded into the framework either via an internal PluginManager or via the Langchain @tool decorator for custom structured tools suitable for function calling.
See langchain documentation for an example of a custom tool here https://python.langchain.com/v0.1/docs/modules/tools/custom_tools/
Creating Tool Functions/Routes
For our plugins, we provide a decorator @BasePlugin.route( … ) which is used to provide the User Facing UI text, the tool mapping text, and the input and output parameters.
In the code example below for implementing Slack as an integration plugin, you can see the following sections:
Class definition
UI components for the integration plugin in the WebUI
Slack Plugin
These will appear in the integrations tab of each agent as seen on the RHS.
This the top level integration meta data covers plugin UI, configuration options, authentication.
In addition to plugin level configuration, each route that is implemented in the plugin also has meta data which is used to map function calling and parameters between the Agent LLM and the plugin integration.
For example, below we can see the send_message function for slack, which sends a message to a channel on behalf of the authenticated user.
The @BasePlugin.route( … ) decorator allows a minimal method to provide meta data to the python function which implements the integration logic.
Some details are:
Type lc_tool (default) as this supports langchain tool for function calling. Plugins however can be loaded via the PluginManager and called directly.
Name the plugin and route identifier
Title The UI User facing text
Description The function description used by the LLM for mapping the user query to the tool. NOTE: This will be shortly split into 2 properties, as the LLM mapping often needs more information in practice to map correctly.
description for UI / User facing text, and
Tool_mapping for the LLM mapping text
Input_params This is the structured inputs to the langchain @tool
Output_param This is the structured output for the langchain tool.
Authentication
The plugin BasePlugin class implements a few different options for authentication, and provides sane default behaviour to make this as painless as possible to implement for 3rd party services.
The following options are supported:
No authentication
API/Access Key
API Key + secret
OAuth2 Access key + secret
1-click OAuth2 (Grant) / Social Login
The slack plugin above implements Access Key only, and the user can configure this by clicking on the edit / pencil icon on the Slack integration and pasting in their API token.
For the last, this is the best UX experience for users, i.e. as shown in the Google ‘Click to Connect your account’. NOTE: However to get this does require that the Azara.ai team setup an OAuth2 app in that service in order to grant access to your users.
1-click (Social Auth) Login
The GMail plugin below implements social or 1-click authentication and the additional authentication logic and metadata can be seen in the code example below:
auth Meta data for the UI/UX and fetching the token for the users account
init Here we need to setup the credentials and client to be used for the authentication
And that’s mostly it - hopefully we have simplified it enough that the task of creating a new 3rd party integration to use with your LLM is as simple as possible.
Till next time when we cover developer tools in the next post,
Contact:
steve@azara.ai aka madscientist
@steve_messina
@ai_azara
Σχόλια