Deeply Integrated Agents

Recently it has become more and more realized just how important context is to an LLM. The more complex tasks we trust AI with, the more data they require. If you want an LLM to not hallucinate you need to provide it with context to get the right output.

This is the problem with using ChatGPT for schoolwork. ChatGPT doesn't understand what you've learned previously. A teacher when I was in Elementary school told us that if there was an answer choice we were never taught, it was wrong. That's the difference between a student and an AI, the student knows what they should know and an AI knows what they know; plus a universe of extra information. This is why High School students get caught using AI even if it's undetectable, because they shouldn't be writing like a College professor. They also shouldn't use things like em dashes if they haven't been taught to use them yet.

Why Integrations Matter for Academia.

Asking ChatGPT to write an essay for your class is like asking it to write a biography of your life. It may be able to search the web for things you've done, but it will never have the true details. Your personal experiences. The same is true for schoolwork, adding integrations with outside services allows a model like o4-mini to truly gain insight into your work. This not only produces fundamentally better work but additonally makes it undetectable after the humanization layer. This is where the value lays in Vault's serivces, undetectable automation.

How Agents are Integrated

Tool Usage

Vault's Agent uses RAG or Retrieval-Augmented Generation to add context to any given request. Every major AI provider support Tool Usage. Tools are simply functions with text describing what they do. There are three types of tools described below:

Type	Description	Examples
Data	Enable agents to retrieve context and information necessary for executing the workflow.	Query transaction databases or systems like CRMs, read PDF documents, or search the web.
Action	Enable agents to interact with systems to take actions such as adding new information to databases, updating records, or sending messages.	Send emails and texts, update a CRM record, hand-off a customer service ticket to a human.
Orchestration	Agents themselves can serve as tools for other agents—see the Manager Pattern in the Orchestration section.	Refund agent, Research agent, Writing agent.

Source: OpenAI Agents Technical Report (2024)

Vault uses data tools for integrations with the Canvas LMS and orchestration tools for visual reasoning and web search capabilites. Vault's main AI Agent runs internally off of OpenAI's o4-mini-high with Vercel's AI SDK.

File Fetch and Extraction Tool

One of the most important tools in Vault's set is document analysis. Since file attachments are dynamic and realtime we had to develop an in-house solution to retrieving files and extracting the content in a way the LLM will understand. The four file types students need in order to complete most assignments are:

Word Documents
PDF Documents
PowerPoint presentations
Image files (Standalone and within documents)

Before the file fetch tool is called the Agent has to navigate and read assignments in the user's course. After this is complete if the Agent finds a file within an assignment that indicates it would be useful it can decide to call the file fetch tool. Once this happens the file fetch tool is passed the Canvas file id. The file fetch tools then fetches the file from the Canvas API and uses a suite of JavaScript libraries to parse the document into text.

Once the documents are parsed to text it's just a matter of returning the result to the Agent. The Agent now has context inside of any attached documents.

Web Search Preview Tool

Proper citations and resources are a need for serious students. Luckily OpenAI provides a built in Web Search Tool... The only problem is that o4-mini isn't supported to use the Web Search Tool directly. For this we use a proxy search agent.

Diagram showing o4-mini-high pointing to gpt-4o-mini pointing to a globe symbol.

Visual Reasoning Tool

Due to the nature of our service images obtained from Canvas must be attached during the generation process. This is not possible using OpenAI's current API as they expect the image to be attached before starting the generation. This makes sense in the context of a chat application. We got around this limitation by using an orchestration tool to perform the visual reasoning for o4-mini-high.

In order to create a truly autonomous experience for the user the Agent must understand images. The most common place to find images is within Word documents or PowerPoint presentations. To process multiple input images during the generation process we use o4-mini-high as the top level Agent to orchestrate everything. Then instead of feeding the images directly to o4-mini-high we actually make gpt-4o a tool for o4-mini-high to use. Multiple proxy instances of gpt-4o are invoked with the task of analyzing the image and generating a text report to feed back into o4-mini-high. The mastermind o4-mini-high gets back detailed reports of each image and that is reintegrated back into the word document. This allows o4-mini-high to have a complete understanding of the document.

Image icon shown going into text paragraph lines.

This entire process allows the Agent to move forward with completing the assignment. This type of visual reasoning has amazing applications in history related assignments where you often have to analyze an image for specific details.