Founded 2012 | HQ Freiburg, Germany | 20 employees (of which 15 in R&D) | > €2.5M ($3M) annual revenue
Skilja has been flying under the radar for years as a trusted supplier of cognitive capture tools to other companies. The introduction of Tegra services that require only a single call opens the door for Skilja to sell to a much larger and wider market.
Skilja (pronounced skill-yah) takes its name from the Icelandic word meaning to understand or to separate. This is an apt title for a cognitive capture software company specializing in what it calls “the 4th generation of document understanding”; that is, the application of document classification and data extraction to read and understand scanned and textual documents. The company’s goal is to continue to develop AI that automates repetitive human tasks, specifically where documents are involved.
Started in 2012 and headquartered in Freiburg, Germany, the company is led by founder Alexander Goerke, who has over 20 years’ experience in the development of intelligent capture technologies. Goerke worked for early capture innovators such as Brainware and Learning Computers International, acquired by Kofax in 2006, where he used AI technology now known as “fast machine learning” back in 2002.
Skilja has technical competencies in the following areas of cognitive capture:
- AI algorithms, classification, feature extraction, and deep learning, and their application to document understanding.
- An open, service-oriented document processing platform for high volume and critical enterprise applications. The company says it has over 20 installations with processing volumes of over 10 million pages per year, the largest of which processes 180 million documents per year.
The company’s business model is to sell its software tools through partners such as OEMs, VARs, and system integrators. For example, a large IT services company in Germany uses Skilja in document capture projects; an OEM customer embeds Skilja software within an ultra-high speed scanning process; and other customers include well-known ECM and capture vendors.
Skilja’s product line consists of three offerings that can be purchased separately or combined: Laera cognitive recognition SDKs, the Vinna document processing platform, and the new Tegra web services.
Laera comes in two flavors: Classifier and Extraction. Laera is sold as a software development kit (SDK) for other developers to embed within their products.
Laera Classifier is an image- and text-based classifier of documents and pages using semantic analysis and machine learning (ML) tools. It includes layout classification, content classification, ID card classification, photo detection, and document separation.
Laera Extraction provides content recognition and extraction for forms, semi-structured documents, and completely unstructured documents based on trainable semantic and syntactic analysis.
For data extraction, Skilja favors ML over deep learning (DL) because for business documents, the number of samples available is much smaller than what DL requires. DL also ignores standard business conventions and doesn’t allow for retraining by the customer.
For the rapid learning of the patterns found within business documents, Skilja developed a cascaded ML method it calls “Fast Machine Learning.” This works well because there is a body of pre-existing knowledge, standard syntax, and grammar, and because the semantics of business documents – e.g., headers, labels, line breaks, fonts – are already known. Skilja’s training algorithms are fast enough that they can run continuously in a background service and, based on user corrections, can improve the models on the fly. Skilja calls this “Online Learning.”
Laera Extraction uses a pretrained language model with semantic elements and entity detection. Each entity is classified using role detection, which is based on the context where the entity was found. To assign entities to fields, Skilja uses trained probability models where each entity can be assigned to one or multiple roles. It looks for logical patterns like labels and headers, semantic patterns that appear within the same sentence, and geometric patterns within semi-structured documents.
Skilja developed its own NLP methods and uses deep learning neural network methods for visual classification of graphic elements. Everything is fed back through the Online Learning service for exception handling by humans in the loop and to improve the system’s performance.
Vinna Document Processing Platform
Vinna is a service-oriented architecture (SOA) developed for high-volume document processing. Vinna orchestrates the Laera functionality, stitching the various steps together into a processing pipeline that can run unattended (see Figure 1). It comes out of the box with a set of more than 40 predefined activities such as message queueing, IMAP import, media conversion, PDF creation, XSL transformation, associative lookup, import, export, file converters, and more. It also offers a marketplace where customers and partners can share their custom activities.
Vinna runs in the cloud or on-premises. Approximately 50% of current installations are running in private clouds that can be located on Azure, AWS, or any other provider. Vinna can also be used to orchestrate RPA activities along with document processing, which could be handy for BPOs that need a little RPA to help a lot of document processing.
Vinna also has several web-based user interfaces for typical capture tasks such as classification modeling, forms designer, field level review, batch review, and document review.
Tegra Web Services
Responding to growing demand for ad hoc, out-of-process solutions, Skilja has launched Tegra services to apply the Laera classification and extraction engines to web apps and portals (see Figure 2). With Tegra, cognitive skills can be added to any application through a RESTful API with no coding and with single action calls. Skilja’s Online Learning function is also available as a Tegra service, so the solution can access ML and continuously improve the results.
Tegra services are RESTful and can be
self-hosted as a service in a Docker container or hosted in IIS or Apache. Tegra services run on Windows and on Linux. Process orchestration can be done with an RPA platform or with Skilja Vinna.
Skilja demonstrated Tegra’s capabilities to extract information from quitclaim deeds (a common document in real estate). Quitclaims are notoriously difficult for data extraction because they are unstructured documents with no rules, and in the US the layouts vary from county to county. The recognition and extraction results we saw in the demo were impressive.
Skilja has been flying under the radar for years as a trusted supplier of cognitive capture tools to other companies. It has built a strong array of AI-powered classification and recognition tools with a platform for high-volume
processing. With the introduction of Tegra services that require only a single call, the integration is now very simple. This opens the door for Skilja to sell to a much larger and wider market than previously possible.
Advice to Buyers
If you are an OEM, a system integrator, or a development manager within a large company, and you are responsible for integrating document capture within a complex automation project, then consider Skilja for your short list. Skilja has a solid track record of AI innovation and is known for delivering reliable results for high-volume document processing. The company’s customer references include large BPOs and insurance companies.
- Very focused on document-understanding R&D
- Deep knowledge and experience in leveraging AI for document processing
- Become a market leader in Europe for high-volume enterprise document process automation
- Use its AI platform to move beyond document understanding to case management
- Become a leading supplier of cognitive capture to the RPA market
- Invest in the North American market
- Blue-chip, high-volume customer base
- Broad adoption of Laera (1,000+ installs) and Vinna (100+ deployed)