Google Document AI


Founded 1998 | HQ Mountain View, CA | 140,000 employees (approx.) | $239B revenue (2021)

Google Document AI provides good general-purpose capture capabilities combined with the Google brand and competitive pricing. It is worth checking out, although its lack of specialization and automation, and its inability to do batch processing, mean it will probably not be suitable for heavy duty enterprise use.


The Company

Google is one of the best-known brands in the technology world. Founded in 1998, the company is currently led by CEO Sundar Pichai and has around 140,000 employees, with 2021 revenues of around $239 billion. The focus of this report is Document AI, which was launched in November 2020 and is a part of the company’s Google Cloud division. Google Cloud generates around $13 billion in annual revenue and is growing at a fast pace, even though it does not currently turn a profit. The Google Cloud business is led by Thomas Kurian, formerly of Oracle, where he held senior executive positions for 22 years including president of product development from 2015-2018.

The Technology

Google Document AI is a set of services within a much broader and larger Google Cloud business. Structurally, Document AI is composed of several key elements (see Figure 1). First, an application programming interface (API) integrates with the Google Cloud and its associated AI services. Second, Document AI itself represents a core set of three services (or general processors):

  1. Optical character recognition (OCR)
  2. A document splitter (to identify page breaks, etc.)
  3. A form parser that reads and extracts data from fields within a form

And third, in addition to these general-purpose capture capabilities, Google provides a small but growing set of pre-configured processors for common document types like invoices, pay slips, tax forms, US driver’s licenses, etc. It will come as no surprise that these are all machine learning (ML), AI-based services, and hence they rely for accuracy and processing on a massive data bank stored in the Google Knowledge Graph of data points and examples. Again unsurprisingly, Document AI uses existing Google natural language processing (NLP) and computer vision capabilities.

The OCR, from what we can see and what other developers have told us, is as accurate as any other similar service; the forms parser, however, seems to have a few more challenges. This is not particularly surprising as forms processing in general is incredibly hard to get right, and the Google Forms functionality is by definition a general, all-purpose processor. In our experience, the only capture vendors – large or small – that get high levels of accuracy here are those that specialize and configure their service for particular form types.

The bottom line is that the Forms capabilities within Document AI seem to work. Still, you would have to check this and probably experiment extensively to ensure that it is accurate enough for your purposes. We stress this, as some Google partners (and indeed Google itself) claim that Document AI is more accurate than its erstwhile competitors. We could not find anything in our analysis to support this. We are not saying Document AI is inaccurate; rather, we question its claims to be more accurate than other systems. We also need to stress that these products are relatively new, and as they are ML/AI-based, their accuracy should, by default, improve
over time.

In practice, Google Document AI provides an integrated document capture and processing console that comes complete with a library of document parsers/processors. It also has a reviewing function to keep humans in the loop (HITL) to ensure accuracy and continuously train the underlying AI. The challenge here is to identify what Google is doing differently from the legion of other document capture vendors in the market. Other than the fact that this all runs in the Google Cloud and uses Google services, it’s hard to say. But we can point out some of its relative strengths and weaknesses.

Document AI is pretty straightforward to get up and to run. That is a good thing, particularly in contrast to many traditional document capture systems that can be a beast to implement. But what most of the conventional document capture vendors and the many start-ups have that Google does not is a focus on specificity and automation. Document AI is a general approach to capture; most others are specialized and optimized to a granular level. Though some Google partners may take the tech and build more granular solutions with the associated Custom Document AI product, Google’s general approach won’t change. But what Google lacks here is focus on automation and true enterprise scalability. Capturing and reading a document is only the starting point of a process. Though Google does provide some simple workflow tools for many organizations, this will not be enough; more automation and some simple-to-use RPA would go a long way.

Maybe the weakest point in the Document AI stack is the inability to do batch processing. This is a cornerstone of Abbyy, Kofax, IBM, and OpenText products. Batch processing means you can move, for example, 10,000 files to be processed in one go. Document AI treats everything as a single transaction, meaning that in practice it works well for processing small numbers of documents but cannot scale to actual enterprise needs.

Figure 1
Google Document AI Processors for General and Domain-Specific Documents

Our Opinion

The first thing to note is that Google isn’t directly competing with the intelligent document processing (IDP) specialists, so comparisons to ABBYY, OpenText, Kofax, etc. are difficult. Rather, Google has commoditized the underlying data capture technology. And though likely not its intent, this will force the traditional vendors to add value upstream in the document process. Google has also made it possible for a new generation of cloud-native, highly automated IDP solutions to emerge built on the Google AI platform. The new IDP players don’t have to develop the core data capture layer; instead, they place 100% focus on solving specific document process problems by industry and use case. This is a sea change in IDP product development. The flip side is that to date, few start-ups have gravitated toward Google and more have instead gravitated to Amazon services. That could change, as it’s early days and Google is certainly a viable alternative.

Advice to Buyers

At this stage, it’s hard to advise prospective buyers other than to say Google Document AI is at least worth checking out, though we would caution against considering it for truly heavy-duty enterprise use. It is impossible not to compare Microsoft’s efforts in the document processing sphere, which are more advanced. Enterprise document processing is a complex set of challenges beyond the core capture functionality that Document AI provides. Most importantly, it involves a vast document processing partner ecosystem (generated through the SharePoint years) and its associated feedback loop, which Google lacks.
On the automation front, a partnership with an RPA vendor (Automation Anywhere) is fine, but it’s hard to fathom how that would work strategically. Again, it is early days and we will keep a close eye on Google’s work in this space. As it fleshes out further over the coming year or two, there is a chance Document AI will become a more attractive proposition.


SOAR Analysis

Strengths

  • Good general-purpose capture capabilities
  • The Google brand and competitive pricing

Aspirations

  • Expand capture from its niche into the broader world
  • Become the de facto capture software of choice

Opportunities

  • Develop more industry- and form-specific modules
  • Partner with ISVs and start-ups to develop full-blown applications

Results

  • Already expanding list of modules
  • Some early enterprise adopters

Attribution-NonCommercial-NoDerivatives 4.0 International
CC BY-NC-ND 4.0 license