Google Launches Gemini

last updated:

Google Launches Gemini

last updated:

The big news? Gemini scored a 91% success rate on document understanding with no pretraining.

Today Google announced Gemini, its “most capable and general AI model yet.” The benchmarks included in the announcement certainly position this as a worthy competitor to GPT-4. And given the recent OpenAI chaos, Google can offer developers an option to diversify their LLM supply chain.

For my beat (intelligent document processing), the big news here was that Gemini is native multi-modal, trained from the ground up to understand text, images, video and audio. Below I shared a screengrab from the Gemini benchmarks page with a fat red arrow pointed at Document Understanding, a core IDP function.

Two benchmark comments stand out:

1. The 90.9% score compared to 88.4% for OpenAI’s multi-modal GPT-4V. Incremental but important progress. The last 10% has always been the hardest to automate.

2. Zero shot. As in recognized the first time it saw the document without annotations and hundreds to thousands of sample docs to train it.

For those not intimate with the details of document processing, this means that Gemini understood the benchmark document set at a 91% success rate, with no pretraining (zero shot). While we don’t yet know what document types were in the set, the numbers are still impressive and point to the real possibility that future IDP products will not require the extensive (and expensive) doc samples collection and human annotation work that is currently the state of the art. 

We have written previously about the “magic” of zero shot learning in the first IDP products to market using GPT 3.5 Turbo or GPT-4. Instabase AI Hub and UiPath Clipboard AI are excellent examples of this extraordinary functionality that anyone can test for free. We have also seen clever zero shot functionality based on OpenAI in new releases from Microsoft, Hyperscience, Indico Data, Kofax, Eigen, Rossum, Antworks,, Skwiz, Evolution AI, BIS Grooper, and more companies who briefed us in 2023. 

If Google can deliver on the promise of the launch, the IDP market will ultimately benefit from the competition between two multi-modal LLMs. Until now OpenAI was the only game in town for most developers. The market can certainly use some price competition to drive down the very expensive cost of using GPT to understand a 20 page contract. (One company told us GPT costs over $2 per document, compared to a fraction of a penny to use a discriminative ML model.) Google also has an impressive track record of supporting its developer community and a lot of street cred earned for launching the transformer revolution. 

Stay tuned for more from Deep Analysis. 

Leave a Comment

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.

Work Intelligence Market Analysis 2024-2029