Founded 2015 | HQ New York, NY | 150+ employees | $10M+ annual revenue

Eigen Technologies is an innovative NLP company with a solid vision to eventually break down the barrier between unstructured and structured data. Its initial focus on R&D for complex financial documents, alongside the gathering of millions of domain-specific training documents, has paid off by producing a high-performing, highly accurate, and scalable platform.


The Company

Eigen Technologies is an AI software company specializing in the use of natural language processing (NLP). The company develops and markets a no-code intelligent document processing platform to extract data out of any text document.

Eigen originally focused on the financial services market to help financial professionals analyze the complex documents common in their sector. For example, when LIBOR rates change, Eigen helps currency traders to quickly review their contracts, some of which can run into the hundreds of pages. Eigen also helps financial and insurance companies to analyze asset risks and loan agreements.

The company is named after eigenvalues, a math equation that has special significance to the founders. Perhaps the best-known business use of eigenvalues is within Google’s page rank algorithm.

Eigen is headquartered in New York City, employs more than 150 people, and has over $10 million in annual revenue. The company has raised nearly $80 million to date over three rounds. Goldman Sachs, an Eigen customer, became an investor.

The Technology

Eigen offers a supervised document AI platform that can be deployed across many use cases. There are two modes of operation:

  1. The Enterprise Ecosystem, for building fully integrated, end-to-end processing solutions.
  2. The Eigen Desktop no-code version, designed for a non-technical user to start on document reviews out of the box and create a straight-through processing mode.

Both modes are based on the Eigen Core, which includes an NLP engine, machine learning (ML) model management, and an NLP plugin development kit. When fed a text document (digital or scanned), the NLP engine performs both point extraction (e.g., names, dates, short phrases, etc.) and section extraction (e.g., clauses for confidentiality, indemnification, assignments, etc.).

Eigen algorithms aim for fast convergence rather than deep learning, in order to produce better time to value. When the user trains on a small number of documents (fewer than 500), Eigen leverages an optimized probabilistic graphical model with finely curated features. The feature sets are broad enough to be applicable to a wide variety of industry-specific content but narrow enough to enable highly accurate and fast convergence. The software employs neural networks when there are 1,000 or more documents to process, typically when the client is in production for a while. In practice, this is rarely needed because out of the box, the models can perform well enough to achieve acceptable rates of straight-through processing.

The engine has a logic interface that allows data points to be linked together in order to answer complex questions without human oversight. Customers can also integrate their own custom NLP code into the Eigen pipeline via the plugin facility.

Since data accuracy is mission-critical for its customers, Eigen has created a four-pronged process to produce the highest results:

  1. Model evaluation to predict the accuracy of extraction in a production setting.
  2. Confidence flags to mark answers as high accuracy when two ML algorithms agree and to separate out low-confidence answers for human review.
  3. A rules-based verification step to validate whether extracted data meets predefined criteria (e.g., is the answer to the date question really a date?).
  4. A human review step for answers flagged as low confidence or needing further verification.

The results of this process are then fed back into the engine to improve the models.

Eigen claims that its engine can achieve acceptably high rates of text extraction accuracy with as few as two to 50 training documents. This would give it a clear advantage over other platforms such as Microsoft Syntex that require thousands of training documents.

We’re always on the lookout for “the secret AI sauce” that produces such leaps forward. For Eigen, it’s the training data. To date, Eigen has pre-trained its large-scale language models on over 22 million real-world documents gleaned from financial, legal, insurance, and healthcare customers. Goldman Sachs, an early customer, gave the company one million domain-specific documents to train on.

This corpus of business-specific data is not only the key to the low-sample training requirements, but also an effective barrier to entry for Eigen’s competitors. Every AI company can talk about the struggle to find enough high-quality documents before a model is trained enough to be client-ready.

Eigen does not “pool” training data from different clients, and the company guarantees data privacy for its customers. And in this age of “cloud first,” Eigen can also deploy an on-premises solution for its many customers with sensitive data, keeping this data out of the public cloud.

The pricing model is based on the value delivered. For example, a client processing 5,000 invoices annually would pay in the low five figures, while an organization with 1,000 contracts to analyze might pay in the low six figures due to the complexity of the documents and the value in removing busy work from highly paid financial professionals.

Figure 1
Eigen Technologies Product Roadmap

Our Opinion

Eigen is an innovative NLP company with a solid vision to eventually break down the barrier between unstructured and structured data. Its initial focus on R&D for complex financial documents, alongside the gathering of millions of domain-specific training documents, has paid off by producing a high-performing, highly accurate, and scalable platform. This R&D base will serve it well as it expands into other markets such as insurance and healthcare.

The Eigen team shared their product roadmap with us, and we were impressed by the realistic plans to create what they call a “semi-general AI platform” designed to solve the most complex document problems in any industry (see Figure 1). The company’s vision is that data and documents will be interchangeable: any document will be converted by AI into a 100% complete data schema. That is a worthy quest, and every business would benefit from the results.

Advice to Buyers

Any company in a highly regulated industry may want to consider having Eigen on their shortlist of vendors to evaluate, particularly if the business process involves complex textual documents. Selecting a start-up for mission-critical data can be a risky proposition for an enterprise. The backing of investors such as Goldman Sachs and ING signifies that Eigen should have the resources to grow and serve its customer base well into the future.


SOAR Analysis

Strengths

  • Training data set of over 22 milliondomain-specific documents
  • NLP algorithms fine-tuned for complex documents

Aspirations

  • Eigen AI transforms docs into 100% complete data schemas
  • All knowledge from qualitative data is instantly searchable

Opportunities

  • Continue to build out industry-specific solutions
  • Acquisition candidate for an enterprise software company

Results

  • Blue chip Fortune 500 customer base
  • $80 million raised to date