Welcome everyone,
Today I'm going to talk about our charter that I wrote when we started this company, and how it makes us different and how we have gone about implementing it.
If you're interested in Azara as an agentic platform - don't forget to sign up on the waiting list here
Azara's vision
One of the things that struck me always about AI agents is that they are of limited use if they aren't empowered to work on your behalf. An agent that you purely converse with isn't useless, though it's mostly confined to the role of blue-sky thinking, or as acting as a coach or mentor for a new field.
However, it makes a poor AI assistant.
Also, most AI solutions are designed to be accessible by engineers, and thus there is a huge gap in the true democratization of AI to the non-technical audience, who make up most of the potential user base for a useful AI.
Azara's Charter
Charters are an important tool for startups to use to keep the team and company focus on doing the right thing, and avoiding scope creep, or trying to create the next AGI.
Too many features result in the death of many startups.
A charter should be concise, clear, written in layman's language, and be memorable - so that each member of your startup can use it to check the value of any work being done.
As a company we do 5 things that together make us unique in creating true AI Assistants:
AI agents you can delegate to
Agents that you can trust to follow your instructions
Agents can run at scale, and are cheap
You (almost) never need an engineer to get work done
If you do need an engineer, the code is simplified to be as painless as possible
We do this by:
Agents that learn by you demonstrating how you do it or by generating code for new tasks
Agents that use Human Centered AI and gently ask clarifying questions of the users requirements
Tasks can explain themselves,
Task executions are self correcting
Tasks take human feedback correction
Tasks group learn into reusable skills
Extending the system is very easy with a composable ecosystem
This is how we explain ourselves
This is also how we protect ourselves from scope creep
We do only these few things. But we do them better than anyone else
Is this important?
Does it move us further along the road to implementing the vision in the charter?
Is this out of scope?
Problem Statement
Pure conversational AI assistants are like managing interns. Smart but with no experience of the job. The side effect of this is that the user ends up micromanaging an agent, without getting much benefit due to the inability of the agent to execute or the fact that their output is error prone due to hallucinations, consistency etc.
If we look at this relationship through the lens of situational leadership https://en.wikipedia.org/wiki/Situational_leadership_theory which is the theory that a leader can / should change their leadership style to each person they are managing (ideally at a task level) depending on that persons competence. Most managers typically have 2 leadership styles, and aren't aware of the ability (or need) that one can change their style so dynamically.
The premise of situational leadership is that one chooses their leadership quadrant by the capability of the agent / person being managed for that task.
Current AI chat assistants act in a similar manner to low competency interns. Smart but inexperienced. We need to tell them everything explicitly, and also follow up on everything to ensure that it has been done correctly.
Therefore, this approach becomes a useful model, for how we would design an effective AI Assistant. We want to move our AI agents from low competency R1 to high competency R4, and thus offload the cognitive load of our user from a micromanaging (telling) behaviour to a delegating behaviour.
In order to do that, we need to understand the limitations of agents, and the current approaches. Below we list the problematic aspects of creating AI Assistants.
Problem 1 - the inability to delegate to them
Assistants need to be empowered, to be able to delegate work to them, and trust that they produce results to your level of satisfaction.
Working with a conversational chat agent, which is still the predominant trend, becomes frustrating because at some point you have to walk away and implement the recommendations yourself.
Problem 2 - the need to trust what they produce
AI agents are still in their early stages, and their main interface is a chat window of some sort.
We need to be able to train our assistants by giving human feedback, providing examples of data, training the models against that data, assisting users on writing their requirements and changing their prompt instructions given to be clearer in their intent.
Additionally, we need the ability to do multi-step checking for relevancy, accuracy, and correctness.
Problem 3 - Consistency
LLM's have an element of randomness built into them. This is a powerful feature in that it gives excellent human like responses in conversations, however it wreaks havoc on traditional programming where we expect consistent results e.g. to interfaces where we are interacting with 3rd party services or tools.
Problem 4 - Speed and Cost
Agentic Flow / Programmings is still very expensive relative to most other models of software development. To get more reliable results, we have to resort to a few methods, namely:
Use of large LLM's which have higher accuracy on a broad range of tasks, but are slower and more costly.
Using agentic RAG or flow, often implemented in tools such as Langgraph, where we add additional steps to validate any data retrieval from e.g. a vector db, and a lot more cost. Also these agentic graphs aren't guaranteed to converge towards a solution.
Using a mixture of experts, or a mixture of LLM's - these are also good at improving outcomes, however cost and throughput are a challenge.
Fine tuning of a LLM against a training data set. This often produces a good jump in results without the massive overhead of creating a new model from scratch.
Distillation is becoming more populat and uses a teacher-student llm approach to use a large LLM to train a (smaller) student llm. This also requires a training data set of questions and answers. It works by rewriting the prompts to make them work.
Advances in H/W for startups such as Groq are coming to the fore and providing very fast inference which is excellent for agents in the future.
How we implement our charter
We implement specific functionality in our platform to address each one of the problems and requirements listed above. Note that as I'm sure you are all aware, this is a very rapidly advancing field, and we are constantly updating our approaches to adjust to what works best.
AI Agents you can delegate to
This is probably the most critical aspect. From the beginning we designed our agents to be able to create and run workflows.
Workflows are a well known feature in many technology organizations who have tried to implement RPA (Robotic Process Automation). However RPA very often results in poor outcomes for multiple reasons,
It's expensive as it relies on enterprise software, and an army of professional services. This puts it out of reach of most SME's who often don't even have an IT department. Many of the larger Automation companies aren't really driven to adopting AI with LLM's except notionally, as their business models tend to be Professional Services (who happend to do automation)
There is a discontinuity between domain experts and engineers. Neither who has much skill in each others domain, and this results in a formal waterfall requirements gathering approach, that proves inflexible to late discovery of exceptions, and therefore automation quickly becomes brittle and of little value.
There is a high barrier to be able to learn and implement RPA automation, if someone did want to empower themselves. From the 1% rule https://en.wikipedia.org/wiki/1%25_rule (covered in another of my blog posts, we know that only 1% of your ideal audience will attempt this daunting task)
Workflows need to be consistent, once trained, and the current focus on function calling to implement actionability is problematic in that we lose reliability and repeatability between runs, and that we have created the worlds most expensive workflow executor.
Agents that you can trust to follow your instructions
I have found that one of the reasons that many become frustrated quickly with AI agents is that they don't seem to follow the instructions.
This is very often true, and occurs for multiple reasons, I'll cover below.
However, a very important reason that is often overlooked, is that most non-technical users are not in the habit or have experience of, providing detailed requirements to remove ambiguity and to scope the task correctly to the agent.
Remember, an AI agent in the current generation of technology, does not have the world sense and continuous feedback that occurs for humans. We need to provide them detailed and explicit instructions.
However, as I'm sure you can all see the problem here. We end up back at the 1% rule - not many are going to provide a large, detailed requirement document up front, just to get started. This alone will put off 99% of your users, according to the rule.
Human Centered AI
In Azara, we use the 1% rule as the centrepiece of our Human centered AI.
We are cognisant of the need to not cognitively overload the user, and we ask gentle, leading clarifying questions, so that they can address them one by one.
At each point, the AI assistant will fill in what it believes are the sensible values, getting the user as close as possible to a 1-click decision at each step.
This has proved highly effective in engaging users and defining requirements for tasks, or for corrections when implementing repeatable workflows.
Human Feedback and reinforcement learning
It is imperative that we provide multiple means for the user to provide feedback and learning mechanisms for the AI assistants to be able to perform better, and thus move up the hierarchy of the situational leadership quadrants. - managing upwards :)
To implement this, we implement the following methods for training:
The user can edit the inbox (which forms the chat memory of the agent) to correct responses
The user can provide feedback using a thumbs up/down on responses in the WebUI and WW, or inbox
When we optimize an agent, and it's set of prompts & documents, web pages, we first generate a synthetic set of Q&A that we can train against. To do this, we use an excellent library called Ragas https://github.com/explodinggradients/ragas which generates questions and answers against our vector db, in terms of simple, medium, and complex queries with answers. We then allow the users to edit these manually.
We implement fine tuning where needed for an agent, using the same Q&A generated in the step above.
We use distillation, a teacher-student model, which allows a large LLM, in our case AWS Bedrock to train a smaller llm, e.g. GPT-4o-mini to get similar results by prompt rewrting. This has the additional benefit of allowing prompts for an agent to be of higher value, by converting the users intent into actionable prompts.
Agents can run at scale, and are cheap
Executing workflows (created via function calling or other methods) by using an LLM, seems like a crazy idea currently, considering the difficulty in reliability of getting the same results on multiple runs, and the cost of executing a known set of tasks - which simply doesn't require any smarts.
To this end, we designed our agents to be smart upfront.
Ask clarifying questions to gather requirements, and generate workflows as no-code.
Execute these workflows when they are triggered by external events, e.g. a form being submitted, or on a schedule.
Where we do need some smarts, e.g. conditional logic, we can utilize a much smaller llm to handle the logic, e.g. does the incoming user feedback message contain any upset language?
This approach makes the workflows repeatable, and very scalable, and very cheap.
Currently we are able to run a complex workflow i.e. a visa application process, with KYC, passport scanning, calendar invites, email etc. at 1000 concurrent workflows with a minimal impact on a single server.
You (almost) never need an engineer to get work done
Azara is primarily designed to be a no-code platform, allowing non-technical users access to the advanced features of automation, and smart assistants.
It pays careful attention to the 1% rule, and ensures that users are bought in and are able to quickly be productive without having a large upfront learning cost.
For example, in order to get started, we allow users to generate their agent prompts, and avatar by clicking generate agent, and giving a brief description of the role and tone.
This moves them as users into the contributor zone of the 1% chart - it's much easier to edit something where you have a bootstrapped starting point - instead of staring at a blank page - we all know what writers block feels like.
In other, areas we do similar - in order to generate a workflow, you don't need to know prompting or how to code or what to connect to in order to automate a complex task.
You could simply ask your agent - "please create an HR onboarding workfow", and it will walk you through the steps of gathering requirements, confirming that all the services and choices are mapped correctly and after only a few minutes, you'll have a sophisticated workflow that is already being used by our customers in production.
No-code is almost always a failure in the long run
No-code excels at quickly implementing and prototyping simple ideas.
However, there is a trap here, as the real world has complexity in it, that no-code (by design) hides. Thus no-code systems often devolve into a series of hacks in order to create more and more workarounds to match the real world.
The reason for this is fairly apparent, if one is familiar with Conway's Law https://en.wikipedia.org/wiki/Conway%27s_law
[O]rganizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations. — Melvin E. Conway, How Do Committees Invent?
Software will always replicate the (complex) communication structures in the real world.
Thus, you as a developer, will require the ability to update the no-code platform to your own requirements, instead of having to hack around a rigid platform.
We provide several mechanisms to allow developers to enhance the platform on behalf of their no-code users:
(Integration) plugins allow the platform to interact with external systems. These are designed to be super simple to implement and work in multiple modes, e.g. as Chat tools, as python language modules, etc.
(Channel) plugins extend integration plugins, to make them available as chat interface conduits i.e. a Whatsapp plugin can be both an integration for an agent (Summarize the chats in a group), or it can be a channel e.g. a group of which the agent is a member and can handle queries and respond.
Scenarios allow us to change the behaviour and logic of an agent. At its simplest, an agent implements "simple" RAG logic. If a query comes in, we check how best to answer the query, pull information from the document KB if necessary, or swap to an expert scenario if needed, e.g. We use the workflow generator scenario, if we are creating a workflow, which assists in asking the clarifying questions, then generating the code for the workflow.
See one of my other blogs posts on Developer Tooling, and also on Core Components.
Scenarios are implemented simply as Jupyter notebooks, with a keyword / key phrase trigger to swap them into the agents prompts, like a favorite specialist hat.
The agent will continue to run thw workflow until the exit conditions are met (e.g. the workflow is generated, or the user exists) and it then reverts to being its usual happy conversational self.
If you do need an engineer, the code is simplified to be as painless as possible
We have put a lot of work into making both plugins and scenarios as simple as possible.
Everything in the Azara platform is a plugin - we eat our own dogfood, because its simplifies life so much.
Plugins are very lean and DRY (Don't repeat yourself) so there is minimal code to write
Each plugin is isolated, so module version issues are a thing of the past
Plugins can be tagged, versioned, and published simply by loading dynamically via the SDK, or by pushing to a branch in the plugins repo, in a continuous delivery manner.
Examples of extending the agents behaviour via agentic scenarios, could be as simple as loading one of the langgraph examples, e.g. self rag from it's github repo. Now your agent does hallucination checking, relevancy matching etc.
Scenarios can also be very complex - e.g. Here's a simplified version of the generate workflow scenario
Conclusion
We have covered quite a lot today in this blog. I hope you find it interesting, and are keen to try us our for your own projects or for your clients.
Our goal is to make it as easy as possible, and to get productive as quickly as possible with AI Agents.
Do sign up to our Developer waiting list here to get access to the SDK, and tools.
In the meantime, feel free to try out the platform, with the free trial version at https://azara.ai
till next time,
madscientist
contacts
steve@azara.ai aka madscientist
@steve_messina
@ai_azara
Comments