IDP Everywhere: Are We There Yet?

last updated:

IDP Everywhere: Are We There Yet?

last updated:

Last year we predicted IDP functionality would someday become commoditized at the OS level. I am surprised at how fast this happened.

50 years ago, AI legend Ray Kurzweil started us on a mission to teach computers how to read documents for us. This software became known as intelligent document processing (IDP) and Deep Analysis defines it as, “the set of software technologies powered by AI that enable a computer’s ability to recognize, read, and – in some cases – understand documents.”

Our definition is intentionally more expansive and inclusive than other analysts. Wherever software is intelligent enough to read our documents, extract the relevant data, and format that data for another application, you’ll find us poking around. We research everything from the highest-volume, most complex back-office claims processing, all the way down to the knowledge worker trying to extract data from a document. We look for innovation at every level: large enterprise or small business, high volume or low volume, back office or front office, and personal productivity. 

IDP in the OS

Twelve months ago, we covered the launch of Microsoft Copilot. At the time we predicted IDP functionality would someday eventually become commoditized at the OS level. I am surprised at how fast this happened. 

I run Windows 11 on my notebook computer and after a recent Windows update, the free Microsoft Copilot generative AI app magically appeared. Copilot is based on OpenAI GPT and now it is multi-modal; that means I can upload images and ask Copilot to read and analyze the content on that image. 

This functionality is hardly new to the market; the IDP software industry has hundreds of specialized products and APIs. But we believe this is a major milestone on the journey because it’s the first time the Windows 11 operating system includes a reasonably competent IDP tool.

I no longer need other software, such as doc-specific AI models to classify the type, computer vision software to preprocess the image, an OCR app to read the text, natural language processing (NLP) to understand the text, and an app to create an output file. It’s all part of Copilot. Allow me to show you. Let’s geek out for a bit and kick the tires.

Here’s how it works
  1. Open any image file in Copilot. I tested a drug prescription receipt (png file) snapped at the pharmacy with my phone camera.

  1. Within five seconds, Copilot correctly identified the document type, understood what text might be relevant, extracted it accurately from the image, and created a CSV file ready for Excel.

  1. I continued the conversation, asking about the medicine and its side effects. Copilot sourced its reply from the Mayo Clinic, so I felt good about the information.

  1. I finished the conversation asking how long this supply should last and is it refillable. Copilot correctly replied and added some good advice.

That took about 60 seconds from start to finish. The CSV file was good enough to send into a spreadsheet. Copilot will also make a JSON file.

I tested Copilot multi-modal on several other photos: an invoice, a post office receipt, a business card, a driver’s license, an Instagram profile screenshot, a product information label from a broken appliance, and more. The text extraction (OCR) was really good – not always perfect, but it was easy to correct. Copilot could be just good enough to use this regularly as a time-saving hack.

Truly useful? Or another GenAI one-trick pony?

At this point we need to place Copilot into the proper context of the IDP market. Copilot is a one doc at a time, ad hoc IDP tool for personal use. The free version of Copilot only reads image file formats, so PDFs won’t work for now. You cannot build and tune your own AI models. There’s no automation feature or batch processing so it’s not going to replace a sturdy enterprise IDP platform like UiPath, Tungsten or Hyperscience to handle document processing at scale. And it’s not a well-designed transactional document solution connected to a business workflow like Rossum or Indico Data. If you need to process a specific document type such as expense receipts, you’re far better served by specialists like Klippa or Veryfi. If you’re building a process and need APIs, you’ll have to upgrade to other Microsoft products or use an IDP API specialist like Mindee. Copilot also lacks process orchestration, scalability, data validation tools, and the expert technical support found in the IDP platforms. 

Yet Copilot is much more than a one-trick pony. This is a nifty productivity tool that could save an office worker several minutes each day, simply by reducing the amount of reading and data entry. It’s good enough for ad-hoc document understanding and data extraction. So, if all you need to do each day is snap a few documents and send the data onwards, this might do the job. I rate this the easiest method to capture data onto a Windows computer I’ve seen to date.

Are we there yet?

Not quite – but we’re closer than ever to our vision of IDP everywhere. This is an important step towards fulfilling Deep Analysis’ prophecy that AI will bring intelligent document processing to every device and every operating system. Last year we covered early entrants such as Clipboard AI by UiPath that put seriously good, no-code IDP on the Windows desktop and is easy enough for non-technical people to use. But Copilot multi-modal takes it to the next level of IDP ubiquity: it just shows up free to every Windows 11 user.

Before you get started, please do remember the AI might make stuff up or occasionally miss obvious data. You’ll want to double-check the results before you share it in Teams.

Leave a Comment

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.

Work Intelligence Market Analysis 2024-2029