AnythingLLM Document Chat API: Query Your Docs via REST

Q: Can I use the AnythingLLM API with Ollama for local-only RAG?

Yes, you can absolutely use the API in conjunction with [Ollama](/self-hosted-ai-llm-ui/librechat-ollama) for a completely localized RAG setup. This configuration ensures that neither your documents nor your chat queries ever leave your private environment. You simply configure Ollama as the backend in AnythingLLM, and the API calls will route the inference requests to your local model while using Whatever vector store you have selected.

Q: How do I disable the Swagger documentation in a production environment?

To disable the Swagger UI and the corresponding `/api/docs` endpoint, you should set the environment variable `DISABLE_SWAGGER_DOCS=true` on your AnythingLLM server. This is a critical security step for public-facing servers to minimize the information available to potential attackers. Once set, the API will still function normally for authenticated requests, but the interactive documentation page will return a 404 or be inaccessible.

The AnythingLLM document chat API allows developers to programmatically interact with their self-hosted LLM workspaces, enabling automated document management, vector embedding, and real-time chat capabilities. By leveraging the RESTful interface provided by AnythingLLM, you can transition from manual document uploads to a fully automated Retrieval-Augmented Generation (RAG) pipeline that powers custom applications, internal tools, or customer-facing chat widgets without exposing your entire management dashboard.

What is the AnythingLLM Document Chat API and how does it work?

AnythingLLM operates as a full-stack solution for local or cloud-hosted AI, where the API serves as the bridge between your stored knowledge base and external software clients. At its core, the API provides access to the same functional engine that drives the AnythingLLM web interface, including the workspace manager, the vector database controller, and the LLM inference wrapper. When you make a request to the document chat API, the system identifies the specific workspace context, retrieves relevant text chunks from the vector database, and passes them along with your prompt to the selected LLM backend.

This architecture is designed for scalability and security in a production environment. Unlike simple LLM wrappers, the AnythingLLM API handles the heavy lifting of document processing--splitting PDFs or text files into optimized chunks and managing their embeddings--so your client application only needs to handle simple JSON requests. This makes it an ideal choice for teams wanting to build a custom "Chat with your Docs" feature into their own propriety software while maintaining complete control over their data on a self-hosted AI platform.

How do you generate and manage AnythingLLM API keys?

To begin using the developer features, you must first generate a system-wide API key from within the AnythingLLM administrative dashboard. Navigate to the 'Settings' menu and locate the 'API Keys' section. Here, you can create multiple keys, which is a recommended practice if you are deploying different integrations across various environments. Each key grants full access to the API endpoints, so it is vital to store these securely and never expose them in client-side code such as front-end JavaScript.

Once a key is generated, it must be included in the header of every HTTP request as a bearer token. For security-conscious deployments, AnythingLLM also provides environment variables that can restrict API access or disable the built-in Swagger documentation entirely. Setting DISABLE_SWAGGER_DOCS=true in your configuration is a common step for production servers to prevent unauthorized users from discovering your API surface area. Remember that keys are tied to the system level; however, the actual chat and document operations are scoped to specific workspaces, which you will need to identify via their unique slugs in your API calls.

Which endpoints are used for programmatic workspace chat?

The primary endpoint for interaction is the /v1/workspace/{slug}/chat route, which handles both simple queries and continuous conversations. When using this endpoint, you must specify the mode in your JSON payload: query or chat. The query mode is optimized for single-turn RAG tasks where the model should only answer based on documents in the workspace, while chat mode allows for a persistent conversation history, making it suitable for standard chatbot implementations.

In addition to the chat route, developers frequently use the /v1/workspaces endpoint to list available environments and retrieve the correct slugs. A typical POST request to the chat endpoint requires a message string and can optionally include a threadId for tracking sessions. Because EverythingLLM supports various backends, such as Ollama or LocalAI, the API ensures that the response format remains consistent regardless of the underlying model you have configured for that specific workspace. This consistency allows you to swap out models on the server side without refactoring your integration code.

How do you upload and embed documents via the API?

Programmatic document ingestion is one of the most powerful features of the AnythingLLM API. The process involves two distinct steps: uploading the raw file to the server and then 'moving' that file into a workspace to trigger the embedding process. Use the /v1/documents/upload endpoint to send files (PDF, TXT, DOCX, etc.) via a multipart/form-data request. The server will respond with a document object containing a unique identifier and the path where the file is temporarily stored.

To make this document searchable, you must then call the /v1/workspace/{slug}/update-embeddings endpoint. In the body of this request, you provide the document path received in the previous step. AnythingLLM will then invoke its internal document processor to chunk the text and store the mathematical representations in your vector database. Once the embedding process is complete, any subsequent chat requests to that workspace will automatically include the context from the newly uploaded files. This workflow is essential for building dynamic knowledge bases that update automatically as new company reports or technical docs are generated.

For those who do not wish to build a custom UI from scratch, AnythingLLM offers a pre-built chat widget that can be embedded on any website with a simple script tag. However, to make this widget truly integrated, you should use the API to manage its settings and appearance. In the workspace settings, you can enable the 'Public Communication' feature, which generates a unique embedding snippet. This allows you to offer a document-trained chatbot to your users while still managing the underlying data through the API.

Advanced developers often use the API to synchronize the widget's behavior with user sessions. By programmatically adjusting workspace settings via the API, you can change the model's temperature, the 'system prompt,' or the specific subset of documents available to the widget. This ensures that the chat experience on your website always reflects the most current state of your data. If you are hosting the platform on a high-performance AnythingLLM instance, the latency for these web-embedded chats is significantly lower than centralized alternatives, providing a smoother experience for your end-users.

What are the security best practices for production API deployments?

When moving the AnythingLLM API into a production environment, security must be your top priority. First and foremost, ensure that your API is only accessible over HTTPS. Since you are transmitting sensitive document data and authentication tokens, standard HTTP is insufficient. Following that, implement strict IP whitelisting if the API is only being accessed by specific internal microservices. This prevents the API from being targeted by external actors, even if your keys are somehow compromised.

Another critical practice is to audit your workspace permissions regularly. The AnythingLLM API can be configured with WORKSPACE_DELETION_PROTECTION to prevent accidental loss of data through programmatic calls. Furthermore, monitor your server logs for unusual patterns in API usage. High-volume requests to the embedding endpoints can be resource-intensive, so implementing rate limiting at the reverse proxy level (like Nginx) is a smart move. By combining these security layers with a managed deployment, you ensure that your document chat API remains both performant and resilient against common web vulnerabilities.

Frequently Asked Questions

What document formats are supported by the AnythingLLM API?

The AnythingLLM API supports a wide range of formats, including PDF, TXT, DOCX, Markdown, and even CSV files. The internal processing engine handles the heavy lifting of extracting text from these various sources, ensuring that the resulting embeddings are clean and searchable within your workspace. For complex formats like nested PDFs or large documents, the API's chunking logic ensures the context remains coherent for the LLM.

Can I use the AnythingLLM API with Ollama for local-only RAG?

Yes, you can absolutely use the API in conjunction with Ollama for a completely localized RAG setup. This configuration ensures that neither your documents nor your chat queries ever leave your private environment. You simply configure Ollama as the backend in AnythingLLM, and the API calls will route the inference requests to your local model while using Whatever vector store you have selected.

How do I handle workspace-scoped document chat programmatically?

Workspace scoping is handled through the URL structure of the API. Every chat request is directed to a specific workspace slug (e.g., /v1/workspace/marketing-docs/chat). This design allows you to maintain strictly separated knowledge bases for different departments or projects. You can programmatically switch between these scopes by simply changing the slug in your API request based on the user's current context in your application.

Is there a limit to the number of requests for the AnythingLLM developer API?

When self-hosting, there are no software-enforced limits on the number of API requests you can make. However, you are limited by the physical hardware of your server, specifically the GPU/CPU for inference and the RAM for the vector database. For high-traffic applications, deploying on a scalable cloud instance is recommended to avoid performance bottlenecks during heavy document processing or simultaneous chat sessions.

How do I disable the Swagger documentation in a production environment?

To disable the Swagger UI and the corresponding /api/docs endpoint, you should set the environment variable DISABLE_SWAGGER_DOCS=true on your AnythingLLM server. This is a critical security step for public-facing servers to minimize the information available to potential attackers. Once set, the API will still function normally for authenticated requests, but the interactive documentation page will return a 404 or be inaccessible.

Conclusion

The AnythingLLM document chat API provides a robust and flexible framework for building sophisticated AI-driven applications. From automating your knowledge base updates to integrating secure, document-aware chat into your own software, the RESTful interface simplifies the complexities of RAG. By following the security best practices and technical workflows outlined in this guide, you can maximize the potential of your private AI deployment. To get started with a high-performance, ready-to-use environment for your API projects, consider deploying a managed AnythingLLM instance today and start building the future of document interaction.

AnythingLLM Document Chat API: A Technical Integration Guide