Founded 2018 | HQ Kirkland, WA | 30 employees | Annual revenue undisclosed
Extracting actionable data from complex, long-form documents has immense potential for enterprise automation. Docugami is not the first vendor to offer unstructured data management tools for long-form documents, but it appears to be setting a new benchmark with its XML data chunk approach.
The Company
Docugami is an AI start-up that was founded in 2018 by Jean Paoli, a former Microsoft executive credited as one of the co-inventors of XML. Docugami competes in the unstructured data management space, using machine learning (ML), natural language processing (NLP), and other AI technologies to read and interpret documents.
The company specializes in managing complex, long-form documents such as master service agreements, contracts, leases, RFPs, bids, clinical trials, etc. Docugami refers to its products as “AI document engineering” solutions, a term that is true to its roots in XML. The company’s name derives from a mashup of document + origami.
Docugami raised a $10 million seed round in 2020 and counts Grammarly among its strategic investors. Bob Muglia, formerly CEO of Snowflake and head of both Microsoft’s Office and Azure businesses, joined the company in 2020 as a major investor and board member.
The Technology
Docugami uses deep learning, NLP, Bayesian, evolutionary, and other AI techniques to discover similar patterns, terminology, and relationships across a body of unstructured information. Docugami’s innovation is to use AI to break up long-form documents into logical “chunks” that reflect these repeatable patterns and relationships across documents. The context of each chunk is then used to create accurate labels for the data, relieving the typical burden on subject matter experts.
A representative use case starts by connecting to a file source to upload documents for analysis. Docugami supports OneDrive, SharePoint, Dropbox, Box, and other popular file repositories. Once the data is “unlocked” from documents it can then be connected in standard formats to various business systems. The company is also promising to develop connectors for a vast array of business applications such as smart workspaces, RPA, CRM, ECM, and more.
Docugami works on files in native formats, including scanned documents, so there is no need to convert or transform the files. Within a few hours after a set of files are uploaded, the results are classified into document sets.
The software can separate documents by type (e.g., this is a lease, that is a contract, etc.). The user can examine the document set list and move files around as needed. The system learns as it goes and becomes better at classification over time. Finally, the software uses the data chunks to prepare a summary, an abstract, or a detailed report (see Figure 1). One can easily imagine how this will save time and reduce errors for any task that requires a structured data summary from hundreds or thousands of documents.
Here are some of the long-form document business problems Docugami is targeting:
- Professional services companies with work controlled by master service agreements and statements of work, requiring vigilance over the terms and conditions to assure delivery and compliance and to manage risk.
- Commercial real estate companies with hundreds of lease and sale contracts, seeking insight into underlying obligations between the parties.
- Commercial insurance brokerages that receive insurance carrier documents in large PDF files and must extract the benefit plan data needed to sell policies.
- Technology companies with many license agreements containing terms and conditions that go out of date as circumstances change.
- Government offices with hundreds of project agreements that need to be audited and updated due to regulatory changes.
In addition to handling unstructured data input, Docugami also provides an assisted authoring function that can, for example, recommend edits to a new agreement based on best practices the system has learned from previous versions. This can save hours of tedious searching through existing agreements.

Docugami AI Creates Structured Reports from Unstructured Documents
Our Opinion
Extracting actionable data from complex, long-form documents has immense potential for enterprise automation. The Docugami team has a deep understanding of document structures and has used that knowledge along with the latest AI techniques to transform unstructured documents into structured reports and data consumable by any other business application and process.
In our analysis, that is a good thing: AI is too often overpromised and as such underdelivers. Docugami is realistic about the work needed to optimize its system, but if that work is done it can deliver a lot of value, and quickly. It is not the first vendor to offer unstructured data management tools for long-form documents, but in our analysis Docugami appears to be setting a new benchmark with the XML data chunk approach.
Advice to Buyers
Any organization with business processes that include complex, long-form documents should consider Docugami on its shortlist of vendors to evaluate. Selecting a relatively new company to store and analyze one’s mission-critical data can be a risky proposition for any enterprise. However, Docugami has attracted support from A-list investors and should have the resources to grow and serve its customer base well into the future.
SOAR Analysis
Strengths
- Ability to recognize data patterns in long-form documents
- Deep document structure and NLP expertise
Aspirations
- Create a world where AI assists humans with document creation and understanding
- Become a platform player
Opportunities
Build partnerships with leading B2B software companies
Acquisition candidate for several industries
Results
- $10 million raised to date
- Blue-chip investors
Attribution-NonCommercial-NoDerivatives 4.0 International
CC BY-NC-ND 4.0 license