Rossum launches its own LLM

by:
last updated:

Rossum launches its own LLM

by:
last updated:

Trained only on transactional business documents, Transactional-LLM promises to eliminate hallucination.

Today, Rossum, an IDP software company based out of the Czech Republic, publicly unveiled Aurora, its new intelligent document processing automation platform. We were recently briefed on this by Tomas Gogar, CEO and co-founder, and the product team. Tomas also announced Rossum’s bold new vision “to empower one person to handle a million transactions a year” and stated that Rossum’s focus is on the transactional documents that power everyday business operations.

Before we move on, what exactly are “transactional” documents? This term has been around for many years, first used by document capture vendors (the predecessors to IDP) to differentiate invoice and purchase order documents from other document types. Rossum is well-known for its original focus on invoice processing but has since expanded its platform to handle a wide variety of other business documents (e.g., purchase orders, bills of lading, shipping manifests, tax forms, bank statements, insurance forms, and more).

Focus on the new LLM

To us, the real innovation news here is that Rossum also launched its own large language model (LLM) as part of Aurora. Dubbed “Transactional-LLM” or T-LLM for short, Rossum’s LLM was trained on one of the largest document data sets in the world (Rossum’s own DOCile) containing millions of annotated, real-world transactional documents.

Before we delve into that, why even go through the expense and hassle of developing a new LLM in the days of readily available public LLMs from the likes of OpenAI? Tomas told us it’s taken over two years to build.

The answer is quite simple. After a few months testing GPT-4, Claude, Google and other so-called foundational LLMs trained on the world’s data, Rossum (and every other IDP vendor we’ve spoken to in the past three months) discovered there are serious shortcomings using those LLMs for IDP applications.

Here’s how Rossum explained it to us. Simply integrating 3rd party LLMs is not enough to satisfy document automation demands, including:

  • High variability of formats: AI needs to learn quickly from basic user feedback. Prompt-engineering is too complex.
  • High accuracy is required: There is zero space for Gen AI hallucinations and reliable confidence thresholds are required.
  • High volume of documents: In most IDP applications, batches of documents are processed so there is no time for “chatting to your document”.

T-LLM was designed to fix all of these problems and still bring the benefits of GenAI to IDP. Just don’t ask T-LLM to write a review of Dune 2 or suggest a recipe for the contents of your fridge. It’s only designed to read and understand those transactional documents mentioned earlier.

LLM or SLM?

Is this really an LLM, or is it one of the first Small Language Models (SLM) to hit the market? Foundational LLMs are measured in the billions of training parameters. In the past month, Microsoft, IBM and other LLM leaders have promoted SLMs as the better solution to the LLM problems of hallucination and inaccuracy of data extraction and summarization.

We asked the Rossum R&D team for help here, and I am paraphrasing their response. While T-LLM has less training parameters than GPT-4 and others, the parameter count is comparable with other LLMs that are optimized for general purpose use across a large category. LLMs mean one training for all use cases; T-LLM can be used for all transactional document use cases. In comparison, SLMs are fine-tuned to specific use cases; one example could be an invoice SLM, trained only on invoices.

How does T-LLM perform compared to other LLMs? According to Rossum’s internal tests, an order of magnitude better. Rossum ran a benchmark against both OpenAI’s GPT-4 and DocLLM. In the following chart, a higher score is the better outcome.

(chart provided by Rossum)

How is Aurora better than existing IDP solutions? Rossum provided us only with a comparison to its own previous product offering.

  • 10 times fewer training examples needed to reach the desired accuracy.
  • A 37.6 % reduction in errors, compared to the previous generation.
  • Zero hallucinated values, compared to using GPT alone.

What do customers say about it?

Whenever a vendor makes new product claims like this, we at Deep Analysis always ask, can we talk to a customer who can validate this? That rarely happens until 6 to 12 months after a new product announcement. But chapeaux to the Rossum product marketing team for providing us with two actual, real customer testimonies on beta test results.

Ayden

Ayden is a financial technology firm for retailers, hospitality, travel, subscription and digital businesses, with over 4000 employees in 27 offices across the globe. In 2023 alone, Ayden processed over $970 billion in payments. Ayden processes over 1 million invoices and Boletos (an official Brazilian payment method) every year. While Rossum’s previous generation AI performed well for Ayden, Aurora has pushed the accuracy levels even higher. With the T-LLM at work, Aurora produced higher data quality by reducing errors by an average of 20.5%, while improving extraction accuracy to 93.4% on average. The following chart from Rossum illustrates the dramatic reduction in the number of training documents required by Ayden to reach a higher level of accuracy.

Wolt

Wolt, an express food & grocery delivery platform in 25 countries, has over 130,000 merchant partners, 200,000 courier partners and 33 million registered customers. Wolt processes over 100,000 invoices every year from more than 2000 different suppliers in 20 languages. Rossum deployed Aurora to Wolt’s AP operations in the Czech Republic, Slovakia, and Slovenia then compared the results with “old AI”.

It took an average of only 70 documents to achieve accuracy over 86%, and with use that rapidly increased to 93%. Average error rates were reduced by 44%. As a result, straight through processing of invoices rates increased from 46.5% to 58% within just 2 days time. Now, some of you are thinking, wait just 58% touchless processing? Then you don’t understand how hard it is to automate complex, multi-national invoice processing. 

Prompts are not always the best

Rossum has also taken a counter-intuitive approach to annotating documents. While most IDP vendors we’ve seen have added prompting interfaces, Rossum decided to stick with their same user interface point and click process to correct errors or add additional data fields. The Rossum team argues that the user is already very familiar with the visual point and click on a document to correct errors or identify different text or regions for extraction, so why make them learn prompting. This approach is more accurate because it solves the problem on the first pass, compared to prompting which could require several iterations to get it right. Some may think this is a small difference, but we think it becomes very important if one is reviewing, say, 100 documents a day.

Summary analysis

We are big believers in LLMs that are fine-tuned for a specific purpose. In March 2023, Deep Analysis was one of the first to call for this approach. So we think Rossum is on to something special with its T-LLM.

We must add a few notes of caution here:

  1. For companies who are shopping for IDP solutions please note that Rossum’s claims for Aurora’s superiority are based solely on comparisons to their older version product. While there is no doubt this is great for its existing customers, one must be careful not to assume these same metrics can be achieved at, for example, an Instabase or Hyperscience customer. As always, we advise you to verify vendor claims using your own proof of concept process.
  2. Rossum has positioned itself as the “only” IDP solution focused on transactional document automation. If invoices are the primary transactional document type (and despite objections to the contrary, they still are), then several other vendors who focus specifically on AI models for invoice processing would beg to differ. To be fair, Rossum can claim that it is today the only IDP solution with a transactional LLM which in itself is impressive enough.
  3. Rossum tends to position its IDP products against “template-based solutions”. That is IDP world code talk for “Kofax” (now Tungsten Automation), the perennial industry leader by sales volume and the favorite target of every IDP startup who goes after invoice processing. While it is true that Tungsten’s invoice processing solutions, Total Agility and Readsoft, came from the old template-based capture legacy, the company has worked for years to update the products with new AI. On the same day as Rossum’s announcement, Tungsten has launched a new AI platform with discriminative ML models not dissimilar to Rossum’s discriminative ML models. Of course, it remains to be seen how well the new Tungsten models perform in comparison to Rossum’s T-LLM engine.

Subscribe to our monthly newsletter

Leave a Comment

Subscribe to our monthly newsletter

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.

Work Intelligence Market Analysis 2024-2029