Building Knowledge Agents at Aircall: Challenges, Solutions, and Learnings in GenAI-Driven Productivity

Sugandha AroraLast updated on December 6, 2024
8 min

Ready to build better conversations?

Simple to set up. Easy to use. Powerful integrations.

Get free access

Ready to build better conversations?

Simple to set up. Easy to use. Powerful integrations.

Get free access

By: Sugandha Arora, Engineering Director, and Enrique SƔnchez, Engineering Productivity Lead, Aircall.

At Aircall, we store most of our internal knowledge in Confluence. As a developer productivity initiative, we decided to build an internal knowledge agent that can answer questions from Confluence. Our goal was to not only scale information access and improve productivity, but also to build a proof-of-concept that could be used as a learning experience as we embark on GenAI-based internal productivity tooling investments.

The challenge of cross-channel communication and documentation

Our internal documentation is dispersed across multiple platforms. Primarily, we organize information within Confluence, segmented into various Spaces that correspond to specific domains such as HR, IT, Finance, and Product. Complementing this, we use Google Drive for additional storage and documentation needs, often linking these resources within our Confluence pages. For internal communication, Slack is our go-to platform, where numerous channels contain valuable knowledge that could be repurposed or referenced.

Recognizing that employees frequently pose questions in Slack channels, we saw an opportunity to streamline information retrieval by integrating a Slack bot. This bot would serve as an interface connected to Confluence and Google Drive, enabling seamless access to internal knowledge bases directly from Slack.

However, we faced a significant challenge: our knowledge data was highly unstructured and scattered across different tools. Early assessments indicated that building a broad Retrieval-Augmented Generation (RAG)-based knowledge agent capable of answering any question would likely yield suboptimal results. To address this, we pivoted our strategy towards developing domain-specific agents. These agents focus on narrower scopes, providing more accurate and reliable answers within their specialized domains. To efficiently manage queries, we introduced a routing agent designed to classify incoming questions and direct them to the appropriate domain-specific agent.

For instance, Aircall operates offices in multiple countries, including France, the UK, Spain, India, and the US. HR policies, employee perks, and vacation guidelines can vary significantly by region. By concentrating on these specific use cases, we can ensure that the information provided is precise and trustworthy, reinforcing the agent's role as a reliable source of truth.

When selecting the technical architecture for this project, one of our primary objectives was to democratize the use of generative AI across the company. We opted for Amazon Bedrock due to its minimal setup requirements and rapid onboarding capabilities. An additional advantage of using Amazon Bedrock is the flexibility to experiment with multiple foundation models. This allows us to iterate quickly through customizations and configurations, enabling us to fine-tune performance efficiently.

Building the Agent

At the time of building our model, Claude 3.5 was the latest generative model available in Bedrock, and also gave us the best performance, specially with reasoning based questions (for example, figuring out the next holiday day, which requires the model to calculate the next holiday relative to current date). We implemented semantic chunking for the underlying knowledge and used OpenSearch vector database. We separately store the user questions, responses and logs for the purposes of debugging, evaluation and architecture comparisons.Ā 

Experiments and LearningsĀ 

1. Dealing with images

We realized that a lot of information - specially for technical documentation - is captured by images such as architectural documents. Since the vector database that we used did not allow input images, we decided to use another LLM to describe these images and store those descriptions into our vector database.Ā 

Since the default Confluence connector available in Amazon Bedrock, does not index images by default, we decided to switch to S3 as the source. For this purpose, we built our own Confluence scraper that would scrape the Confluence data and store it in S3.Ā 

However, our experiments showed an increased probability of hallucinations when using LLM-translated image data as source, and we wanted to bias our agent towards accuracy rather than coverage, as a result, we finally decided to not use image inputs for our vector embeddings.Ā 

2. Dealing with links

It was straightforward to do a linked page traversal if the linked page was also on Confluence. In these cases, we indexed the linked page as well using our web scraper and stored hierarchy related metadata. Links to other platforms like draw.io and Google drive were trickier and needed to be individually connected to for scraping. User permissions on these document types were another consideration. For our first iteration, we only included documents with company-wide access.Ā 

3. Dealing with outdated dataĀ 

A big problem for our existing documentation is outdated data. For example, our HR space in Confluence may have some outdated HR policies that are no longer accurate. We explored adding a rule in our scraper to ignore data older than a specific date, but soon realized that there may be some cases where older data is still relevant (for example, the benefits in a country may not change for many years).We then tried dealing with outdated data at a prompt level, by setting our prompt to ignore older data where relevant, but again saw mixed results.Ā 

Finally, we decided that this was less of an engineering problem, and needed to be solved through underlying data. One easy solution we found was to add a setting in our web scraper to ignore specific confluence pages. As we find outdated information in our testing, as a process, we will flag it to the relevant team, however, our agent can immediately address this through a list of ā€˜ignoreā€™ pages that we maintain so that we donā€™t scrape any outdated information.Ā 

4. Model performance and agent memory

Among the foundational models available as of September 2024 in Bedrock, we found Claude Sonnet 3.5 to be the best performing one, however, at the time of building this, this model did not have memory to store previous conversations. We explored our own solution for adding agent memory by storing conversation history in Dynamodb, however, eventually settled for using Clause Sonnet 3 which comes with its own memory and saved us from building an intermediate memory layer and another potential source of answer inaccuracies.Ā 

5. Coming up with the right prompt

This was one of the most time-consuming steps. We figured in our experiments that the right prompt made a huge difference in performance and also to keep the model compliant with our company policies. After reading about the latest in this area and trying our own tuning, we heavily relied on this prompt generator from Anthropic to guide our agent prompt.Ā 

Once we found the right prompt to maximise our agent performance, we created an internal super-prompt with some parameters that can be adapted across various use cases within Aircall, e.g.:

6. Source Citations for Trustworthy dataĀ 

Including the source citations can take us a long way to earn user trust in the agentā€™s answers. By default, when asked for citation, the source displayed is the s3 bucket source where the scraper stores the Confluence data. We modified the associated metadata and the lambda function used to invoke the model, to include the underlying Confluence URL when the agent was accessed through Slack.

7. Slack Bot IntegrationĀ 

We created a Slack app that could be installed on our workspace and used to access agents, similar to this. However, while we were working on our project, Amazon released this Chatbot integration tutorial for Bedrock agents and we found this to be a much more straightforward and faster way to do the initial integration as well as remove and add connectors as we iterated through our underlying models.

8. Response Personalization

Although our knowledge base does not have access to access-based user documentation yet, we implemented a few useful personalizations in our lambda function layer. For example, we pass the userā€™s name and timezone from Slack, and can use this to tune the agent responses.Ā 

9. GuardrailsĀ 

Amazon Bedrock has built-in guardrails that help block harmful content and hallucinated replies. We tested with guardrails and got good performance. We also tested with tuning our prompt to include guardrails and saw equally good results. Our learning was that a good prompt was enough to provide security and privacy adherence, even without adding guardrails. We did a thorough testing with bad prompts and did not find any instances where our security or privacy policies could be violated. However, in the spirit of adding security to every layer, we added protections across both prompt and guardrails.Ā 

Future Plans

We are in the process of developing the next version with access to Google drive, and potentially personalized based on the documents that the user has access to. For example, a user can ask a question like ā€œHow many vacation days do I have this yearā€, and the agent would be able to look up the vacation balance for the individual asking the question.Ā 

We will also be indexing more Confluence spaces and building the routing agent to allow for broad queries.Ā 

One of our biggest learnings was that the structuring and organization of the underlying data is a huge factor in agent performance. We are working on two distinct initiatives to deal with this going forward: we are working closely with document creators (e.g. HR team) to define a common structure for all the underlying documents, and to make this more scalable and automated, we are exploring utilizing LLMs for extracting and organizing facts from a given knowledge article. Another interesting learning was the use of metadata with valuable information (links, keywords, number of words, updated date) made it easier to store and retrieve the knowledge, thus giving us a boost in agent performance.Ā 

Hear more from our innovators and developers! Have a read of our Tech Team Stories to find out what goes on behind the scenes of our powerful platform


Published on December 6, 2024.

Ready to build better conversations?

Aircall runs on the device you're using right now.