Clipboard AI: IDP for the Masses?

last updated:

Clipboard AI: IDP for the Masses?

by:
last updated:

TIME Magazine named Clipboard AI to its Best Inventions of 2023 list. Does it live up to the hype?

October 24, 2023. Mark it and remember it. For this is the day that Intelligent Document Processing went mainstream media. TIME Magazine announced its Best Inventions of 2023, and UiPath’s Clipboard AI was named to the Productivity Category.

Clipboard AI replaces the need to manually copy and paste text from one application to another. The user opens a document, presses copy, chooses the destination (could be a form, an app, a table, etc.), then clicks paste and data entry is complete. Under the covers, Clipboard AI uses a combination of generative AI, machine learning models, and OCR to extract the data and to figure out where the data should go. This is IDP at its core: classify, OCR, extract, and deliver.

UiPath has cleverly combined document AI with its original RPA feature: screen scraping data from one window to the next. And has done it with the easiest to use, friendliest interface we have seen so far.

The point of Clipboard AI is for the computer to automatically extract data so you can paste that directly into the application of your choice. It works just like an old-fashioned Windows screen scraper app; open two windows then copy and paste data between them. Clipboard AI’s innovation is using a Large Language Model (LLM), machine learning (ML) models for specific document classes, and an AI-driven OCR engine for print and handwriting recognition.

Previous attempts to do this required the end user to click several times on the page – or lasso the region – to identify and copy the data. Without generative AI bringing natural language, end users became confused or discouraged by overly-techie functions and phrases. This same functionality has also been possible for years using 3rd Wave IDP products; but those products required complex setup, training, and retraining, and an expert user, not an average office worker. Clipboard AI is designed for that average worker.

How does Clipboard AI actually perform beyond the hype? I ran it through document tests to find out.

Hands On with Clipboard AI

Version tested: V23.9.2

The first thing I noticed is this is a Windows app with an MSI installer package. Thoroughly accustomed to cloud AI apps, this surprised me. As I used it, the desktop install made sense. Clipboard AI lurks in the background, ready for action when you open a document.

UK Drivers license (JPG)

Clipboard AI classified it correctly and performed the OCR on the image. It worked fine, with zero-shot classification and extraction. All fields were correct, as expected.  

US Passport (PDF image)

Clipboard AI classified it correctly and performed the OCR on the image. It worked fine, with zero-shot classification and extraction. All fields were correct, as expected. I threw a spanner in the works just to see how Clipboard AI would react. I instructed it to extract the cursive text of the “We The People” statement on my passport.

Not perfect; but not bad at all. “We the Neople Of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranqulit, provide for the common defence, the general Welfare, and secure promote the Blessing of Liberty to ourselces and ORr Posterity do ordain and establish thes Constitusion for the UnitedStates of America.”

Amazon Invoice (PDF)

Clipboard AI classified it as Plain Text. Even though the document says Invoice at the top and is obviously formatted as one, the app failed to recognize this. Using the Data Mapper, I changed to the Invoice model. It worked fine with zero shot extraction. All fields were successful, as expected, including a basic table extraction. The line items were neatly pasted in Excel rows, so I did not have to reformat anything. That’s a nice time saver if you do this a lot.

The Clipboard AI Data Mapper – where you can change the document class.

Online store invoice (PDF)

Clipboard AI classified it incorrectly as a Receipt. When I changed to the Invoice model, it worked fine with zero shot extraction, all fields as expected, including a basic table extraction.

 Online travel receipt (super nicely formatted PDF)

Clipboard AI identified it correctly as a Receipt. Data copy and paste worked fine with zero shot extraction. It was also able to correctly extract the merchant’s name from a logo. That’s another helpful time saver.

The Data Mapper also has a useful data and document viewer option, so you can see how the data was mapped on the original document. It won’t replace a properly full-sized human-in-the-loop viewer, but it’s good enough for a quick view.

Discount Store receipt (a classic cash register receipt JPG)

Clipboard AI identified it correctly as a Receipt. It performed the OCR on the image. It worked fine, with zero-shot classification and extraction. All fields were correct, as expected, including the merchant’s name from a logo. It misinterpreted the £ sign as “3“, so the total price was wrong. I was able to correct that easily enough.

Deep Analysis blog (web page)

Using the Chrome extension, Clipboard AI identified it correctly as Plain Text. Here’s the problem with the Plain Text option: you have to manually instruct Clipboard AI for every label you want extracted. We’ve tested similar AI tools that can auto-recommend labels and fields, and we prefer that extra automation. Clipboard AI performs very well at Named Entity Recognition, which are the obvious labels like Title, Date, Address, that one finds in a document. On this text, it also did well at inference from context, correctly extracting the name of the companies in the blog even though the word “company” is not next to it. ChatGPT, Bard, and Microsoft CoPilot are also very good at this. The data extraction worked much faster than from files on my PC. That’s to be expected (cloud to cloud, vs document to cloud).

Mortgage Statement (PDF)

Clipboard AI classified it incorrectly as a Utility Bill. I changed to the Semi Structured option. As with the Plain Text option, again I had to define everything I wanted extracted. This was a simple bank statement layout, so I expected more automation and less manual labelling work. Clipboard AI was also unable to discern and extract the payment made from the line item in the table, even though I tried various prompts. I corrected this easily enough.

Stock Purchase Agreement (unstructured PDF image)

Clipboard AI classified it incorrectly as a Purchase Order. The document clearly states the title at the top of page 1. I think because Purchase is in the title, the AI gave a good effort but ultimately failed. I changed to the Plain Text option and defined my labels.

Clipboard AI performed the OCR on the image and then correctly extracted Document Title, Date of Agreement, Buyer Name, Purchase Price, and Price per Share. However, it could not correctly calculate the Total number of shares purchased. This instruction was not a simple NER extraction like the other labels; it required a bit more “understanding” from the AI to infer the correct reply from the context. I had to open a calculator app and tote it up myself, then fix the entry.  

First Look Impressions – summarized

 Before I share my opinions, it’s only fair to mention some disclaimers from UiPath:

  1. It’s not finished yet. Clipboard AI is in public preview and will be generally available in early 2024. UiPath continues to refine the app based on customer and partner feedback, and the app will only get better over time.
  2. It’s one part of a comprehensive product line. Clipboard AI is a personal productivity tool for individuals. Customers shouldn’t expect the same level of accuracy or feature robustness that they would get from UiPath’s enterprise solutions for Document Understanding and Communications Mining.

Clipboard AI is well-executed as a personal computer productivity tool; UiPath’s product team delivered an exceptional user experience that hides the wonky AI underneath it all. This is hyped as a time-saving tool; the potential is clearly there. However I encountered several little time sinks (see the Could Be Better section below) that taken together could add time and therefore diminish the value for some users. Clipboard AI needs a bit more work to meet the lofty expectations that come with a TIME award, and as just noted, UiPath is committed to continuous improvement.

Now, here’s the point-by-point analysis:

Good

  • Anyone who can use a Windows computer will be able to use this. This could be IDP for the masses.
  • It’s an ever-present desktop app, ready to work whenever you open a document.
  • It worked acceptably on Specific Models (IDs, receipts, etc.) and Invoices, with true zero-shot training and very high accuracy.
  • Very good at Named Entity Recognition (NER), the obvious labels one finds in a document. This cannot be overlooked or minimized; prior to LLMs, good NER was hard to deliver.
  • Seems relatively tolerant of mis-spelled instructions.
  • The Chrome browser extension works very well.

Could be better

  • Can read only one page at a time. For multipage documents, you have to repeat this for every page.
  • It’s not persistent. After spending time to build a custom extraction for a semi or unstructured doc, I bring up another doc that’s the same class, and I have to start all over again.
  • The zero-shot performance and data accuracy seem to degrade as you move into semi and unstructured documents, and as the instructions move beyond obvious NER labels and into inference based on context.
  • Each time a new field label is added to a custom data mapper, it seems to reload ALL fields.
  • Doesn’t like it when I try to label multiple fields at once. Takes a while to recover.
  • Building a custom model with your own prompts/instructions takes a bit of time, and as this seems to be required for every page of every document, then the productivity gains are minimized.
  • Would be nice if I could add a Field to the Specific doc models, but for now they seem to be static. To add a Field, it appears I have to build a custom extraction model from scratch and define all labels manually. That’s a pain for basic forms (IDs, W-2 tax form, etc.) and could negate any time saved. I could have humanly cut and paste faster than this.

Final Thoughts

Chapeaux to the UiPath product team for an innovation so clever that it made the TIME Magazine Best Inventions of 2023 list. And for demonstrating how real business value can be delivered to end users simply and effectively, even though it’s leveraging much-maligned, over-hyped AI. (sarcasm intended)

2023 is a transformative year for IDP as the 4th Wave of generative AI has inspired the creators to build better and more widely accessible IDP solutions like Clipboard AI. UiPath stands shoulder to shoulder with other 4th Wave innovators we’ve covered this year: Microsoft CoPilot; Instabase AI Hub; Indico Data; Hyperscience; Eigen Technology; Base64.ai; AYR; Alphamoon; Kofax; Lucy.ai: Docugami; BIS Grooper; Skwiz; and more.

After 50 years of hard graft, IDP technology is poised to go mainstream and show up in every office worker’s daily routine. I’ve spent nearly 30 of those years making IDP products and it is an honor and a privilege to watch this unfold.

 

Find the news release here: https://ir.uipath.com/news/detail/315/uipath-clipboard-ai-named-one-of-times-best-inventions

Leave a Comment

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.

Work Intelligence Market Analysis 2024-2029