Founded 1982 | HQ San Jose, CA | 26,000 employees (approx.) | $17.6B revenue FY22 (approx.) Feb 2023
Adobe Document Cloud offers low-cost and out-of-the-box tools to automate many standard document processes. Its services and APIs are, in our estimation, underused and underappreciated by many enterprises that instead utilize expensive and typically overly complex IDP products. It is worthwhile to consider if Adobe, possibly in conjunction with Microsoft M365 and Power Automate, is a good fit for the job.
Adobe was founded in 1982 and is headquartered in San Jose, California. The firm has revenues of approximately $17.6 billion as of FY22 earnings and approximately 26,000 employees. It has been led by CEO Shantanu Narayen since 2007 and is publicly traded on the NASDAQ exchange. Adobe is best known for its Experience and Creative Cloud businesses. Over the past decade, the company has become the dominant force in customer experience (CX) management and the go-to technology brand for digital marketing and digital creation. However, the company’s roots actually lie in its lesser-known cloud business, Adobe Document Cloud, which includes the PDF document format and is the focus of this report.
Adobe Sensei is the company’s AI and machine learning technology used across its products to automate tasks, predict behavior, and deliver personalization. Though Sensei has been used extensively in the Creative and Experience cloud offerings, its use in Document Cloud is less well known. As of today, Sensei is used in three key Document Cloud modules (APIs): Adobe PDF Extract API and Adobe PDF Accessibility Auto-Tag API (part of Adobe Acrobat Services), and Liquid Mode for its mobile Acrobat Reader. Sensei detects structures within PDF documents, such as titles, tables, images, and paragraphs, with reading order (see Figure 1). This is important because PDF files are notoriously difficult to run data extraction against; the sheer complexity of the PDF format means that many, if not most, third-party document analysis products struggle with them.
Leveraging Sensei, PDF Extract API undertakes this extraction work and creates a JSON file from any PDF document. It understands the document structure, classifies text objects, and assembles them into a natural and logical order. The fact that Adobe retains the reading order is an important differentiator over its competition, which typically simply pulls the text through optical character recognition (OCR). So, PDF Extract API appears to be a regular document extraction tool, and it is – however, it is optimized for PDF, one of the most challenging formats to extract data from.
Moreover, as Adobe owns the PDF format, one would think that PDF Extract would be the tool/API that every intelligent document processor would utilize for this work, but typically it is not; instead, vendors attempt to build their own PDF extraction tools. There are sound commercial reasons for them to do so; however, from a buyer perspective, if you need to extract data from PDF documents, common sense tells you PDF Extract is the one to use. Extract provides more structural and contextual information about the content of PDFs, thereby simplifying downstream processing and improving the quality of the desired results. For example, if a user wanted to inject PDF content into an AI-based NLP engine, the additional structural/contextual information Extract provides makes it easier and increases the NLP engine’s accuracy in interpreting the content.
Next up is the PDF Accessibility Auto-Tag API, which utilizes the same Sensei capabilities, but instead of extracting data it automates the tagging and labeling elements within a document. Ultimately, the auto-tagging allows for screen readers to accurately read a document for the visually impaired. This saves costs from manual tagging and allows for faster remediation and more inclusivity. As accessibility compliance issues are now on the horizon (particularly within EMEA) this type of tool will be needed for organizations to make their documents accessible.
Adobe offers REST APIs, SDKs for Java, Node.js, .Net, and Python, as well as connectors for UiPath and Power Automate.
Finally, we have Liquid Mode, introduced in 2020 in the Adobe Acrobat Reader mobile app and now being extended to large screens. Liquid Mode was first released for mobile devices, where reading long form documents can be notoriously difficult; its purpose was to improve that reading experience by ensuring that documents fit more ergonomically on the screen so readers could navigate them without needing to pinch and zoom and lose the context. Significantly, Adobe is now extending the use and capabilities of Liquid Mode beyond mobile to deconstruct and reassemble PDF documents to be more readable and digestible on any screen or device. In essence, they are unlocking PDF and automating the reconstruction of the text, figures, and tables into more readable structures, depending on the device being used. Liquid Mode checks two necessary boxes, firstly, supporting accessibility requirements (for example, those with partial sight), and secondly, making complex documents much easier to read for the general public.
Taking a step back from these specific modules, Adobe provides a range of supporting services, such as sharing, securing, and storing documents. Simply put, it provides an enterprise file sync and share (EFSS) platform akin to Box or Dropbox. Interestingly, Adobe itself makes little of these capabilities, perhaps in deference to its technology partnerships. Nevertheless, it’s good stuff and essentially brings basic but highly effective ad hoc workflow functionality, which in turn provides a usable and highly secure document management platform.
The critical thing to note here is that the total value of Adobe Document Cloud is much more than the sum of its parts. Individually, each module is a smart and valuable workhorse for document processing. There are a dozen more similar (though not Sensei-enabled) APIs ranging from Adobe Acrobat Sign (Adobe’s digital signature service) to OCR. What seems to be missing today is something to pull all these various content services into a unified whole; it is there, but for some reason lies hidden within the product offering. It appears as simply a collection of (albeit highly useful) content services. Similarly, Adobe Experience Cloud was a series of unconnected silos in its early days, but it developed over time into a market-leading product/platform set in its own right. Adobe Document Cloud has the same, if not greater, potential. It should be noted that the total addressable market for document capture and processing is larger than that of customer experience management.
Adobe Document Cloud services and APIs are, in our estimation, underused and underappreciated by many enterprises that instead utilize expensive and typically overly complex IDP products to automate many standard document processes. For example, the legion of enterprises and SMBs that leverage applications from Microsoft, Salesforce, or Workday could – by using the APIs – quickly, cost effectively, and securely automate anything from contract review and signing to sharing, annotating, and collaborating on business-critical documents.
Deep Analysis defines the intelligent document processing (IDP) market as “The set of software technologies that enable a computer’s ability to recognize, read, and – in some cases – understand documents.” Adobe, in the form of Liquid Mode and PDF Extract API, do just that, so though unique, ubiquitous, and distinct in the market, Adobe is a major player in the IDP marketplace. Indeed, it is hard to imagine the IDP or the broader document management market without Adobe. Ironically, despite this, the potential of Adobe Document Cloud has been underappreciated by the market in general and, in our estimation, by Adobe corporate itself. It may be that the term PDF masks the power of some of these services that, if used effectively, would eliminate the need for more costly and complex document-centric process solutions across enterprises.
Advice to Buyers
Adobe Document Cloud provides much more than the ability to “click/save to PDF.” It offers many low-cost and out-of-the-box tools to automate many standard document processes. Before looking at expensive third-party IDP solutions to automate document-centric business processes, it is worthwhile to consider if Adobe, possibly in conjunction with Microsoft M365 and Power Automate, is a good fit for the job. In many cases, it will likely prove to be a much lower-cost, faster, and easier-to-deploy alternative than more expensive and complex IDP systems.
- Adobe is one of the best-known brands in the world
- Adobe has deep technical understanding and ownership of the PDF data structure
- Build Document Cloud into a larger business than Experience Cloud
- Move beyond the current market perception of Document Cloud solely being attached to PDF format
- Extend and build out more pre-configured document workflows
- Further integrate with and extend its Microsoft partnership
- Multibillion revenues generated
- Ubiquitous use of PDF products globally
Attribution-NonCommercial-NoDerivatives 4.0 International
CC BY-NC-ND 4.0 license