Giant language fashions work notably properly with uncooked textual content. Firms that wish to create their very own AI workflow know that it has develop into extraordinarily necessary to retailer and index information in a clear format in order that this information might be reused for AI processing.
That’s why Mistral is launching a brand new API at the moment for builders who deal with advanced PDF paperwork. Mistral OCR is an optical character recognition API that may flip any PDF right into a textual content file.
In contrast to most OCR APIs, Mistral OCR is a multimodal API, which means that it might probably detect when there are illustrations and images intertwined with blocks of textual content. The OCR API creates bounding containers round these graphical parts and contains them within the output.
Equally, Mistral OCR doesn’t simply output a giant wall of textual content. The output is formatted in Markdown, a formatting syntax that builders use so as to add hyperlinks, headers and different formatting parts to a plain textual content file.
Giant language fashions rely closely on Markdown for his or her coaching information set. Once you use an AI assistant, equivalent to Mistral’s Le Chat or OpenAI’s ChatGPT, they usually generate Markdown to create bullet lists, add hyperlinks or put some parts in daring. Assistant apps seamlessly format the Markdown output right into a wealthy textual content output.
“Through the years, organizations have gathered quite a few paperwork, usually in PDF or slide codecs, that are inaccessible to LLMs, notably RAG techniques. With Mistral OCR, our clients can now convert wealthy and sophisticated paperwork into readable content material in all languages,” Mistral co-founder and chief science officer Guillaume Lample mentioned.
“This can be a essential step towards the widespread adoption of AI assistants in firms that must simplify entry to their huge inside documentation,” he added.
Mistral OCR is on the market on Mistral’s personal API platform or via its cloud companions (AWS, Azure, Google Cloud Vertex, and so on.). And for firms working with labeled or delicate information, Mistral additionally gives on-premises deployment.
Based on the Paris-based AI firm, Mistral OCR performs higher than APIs from Google, Microsoft and OpenAI. The corporate has examined its OCR mannequin with advanced paperwork that embody mathematical expressions (LaTeX formatting), superior layouts or tables. It’s also imagined to carry out higher with non-English paperwork.

Provided that Mistral OCR does one factor and one factor solely, the corporate believes additionally it is quicker than what’s on the market. That’s not a shock if you happen to examine it with a multimodal massive language mannequin like GPT-4o, which additionally has OCR capabilities.
Mistral can also be utilizing Mistral OCR for its personal AI assistant Le Chat. When a person uploads a PDF file, the corporate makes use of Mistral OCR within the background to know what’s within the doc earlier than processing the textual content.
Builders can even use Mistral OCR with a RAG system to make use of multimodal paperwork as enter in an LLM. And there are lots of potential use instances. As an example, I might see regulation corporations utilizing it to assist them swift via enormous volumes of paperwork.