Models and the RAG trade

last updated:

Models and the RAG trade

last updated:

In summary, in pretty short order we’ve gone from “ask anything!” to “ask the bare minimum of questions and when you do so, send it everything!”. So, how did we get to this little bump in the road?

Having waved goodbye to the newly-released “Work Intelligence Market Analysis 2024-2029”, which is now out in the wild, rather than pop my feet up for a bit, what I wanted to do was to start the process for compiling the updates to the following chunky report that will require a 2024 refreshing update; “Workplace AI Market Analysis: Generative AI and the Desktop (R)Evolution”.

That report – a product of that glut of worker-focused desktop generative AI products and features that cascaded into the marketplace in 2023 – was never designed to be a tech-focused architectural guide. If that’s your thing, then there’s a litany of discussions, benchmarking exercises, and associated Rock ‘Em Sock ‘Em model on model match-ups you can digest.

Instead, it was designed to provide a work and worker-focused precis of how the technology might fit within an organization’s ways of working and how you might find the right points of entry for your everyday processes. It was interesting to try and encapsulate that [coughs] “dynamic” marketplace. However, I suspect it is less enjoyable to pull together into something readable by our long-suffering editorial and design team. Hopefully, by the time the update arrives with them in the summer, some of that scar tissue will have hardened.

The five micro ages of the massive models

One of the interesting areas within the general generative AI market dynamic right now is the evolution of how the LLM foundation models are positioned by those offering products derived from them. Our story so far could be summarized in these five micro ages;

  1. Massive models that know everything can perform any task you throw at them. Try them and see!
  2. OK, the massive models only know a certain amount, and you’ll probably want to pair them with a human to check the responses, but still….. WOW eh?
  3. Right, so if you want precise responses, you’ll need to use your own data as well as the massive models to make that happen
  4. Yeah, that will mean having a good handle on your own data. Have you tried maybe putting it into a semantic search?
  5. Now that the search has fixed everything, you only need to use the massive model to assemble the response, so you don’t need to worry about any knowledge deficiencies. Plus, it means you can limit the number of times you ask anything!

In summary, in pretty short order, we’ve gone from “ask anything!” to “ask the bare minimum of questions, and when you do so, send it everything!”. So, how did we get to this little bump in the road before the meter man has really set-up shop?

Models vs Knowledge

In a way that will be familiar to anyone who has hung out in or near new technology when it first meets work, there is often some impact damage sustained by technology, while work rumbles on barely dented. Those dents will have been pretty close to the predicted outcome by those very close to the development of the technology. In the case of LLMs, well before the first derived products came into view for organizations, the need to find a more precise method to manipulate specific knowledge was proposed by researchers from Meta/UCL.

Their proposal of  “Retrieval-Augmented Generation” (RAG) in the 2020 paper was devised because “[LLMs] ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures” now forms the basis for how the assembled masses of products that are mainly, but not exclusively called Copilot, are now prefaced. It has also meant a rapid reappraisal for many in their strategy around unstructured data.

The term “grounding” is often used alongside RAG – sometimes interchangeably – in this discussion; a borrowed term that has been around for a long while to help define where machine-extracted or generated data could be attached – or grounded – to a specific verified data point, to provide it with precise veracity (for example, a link to a reference document). As LLMs are and were specifically general or foundational, this specificity to the work task wasn’t ever baked in. RAG allows them to stay general and builds a supporting information architecture around them.

Model tuning or mini models?

Now, you might be thinking that it seems somewhat backward to build an entirely new information architecture around LLMs—in part to avoid using the LLM because of its lack of knowledge—when you just teach it new stuff. Isn’t that the point at which this has often been sold? Always learning, constantly improving?

Well, yes. But in practice, this becomes a bit tricky. And then it gets really tricky and horrifically expensive.

This blog post from Microsoft in 2023 explains some of it well; the process of creating a model with a set of additional knowledge – Fine Tuning – is well-understood enough to be ruled out in most cases as too time-consuming and expensive for most tasks. It’s also a task that would need to be factored as a piece of constant ongoing maintenance and one requiring the retention of expensive human skills and a lot of resources to stand up whether you’re putting it in your own data center or placing it into the infrastructure of one of the hyperscalers. You’ll have to be super confident that those sums add up ahead of time before embarking on that journey, and what seems to be the case is that for most, they don’t add up, at least for organization-specific tunings.

But what about industry-specific models? Or task-specific models? These mid-scale models, where the tunings are designed for scenarios that require much more precision than LLMs but not the organizational-specific precision that would make them too costly to consider, are already with us in many cases, especially around code creation.

Where there is an audience broad enough to be addressed and where the use case is valuable enough, mid-scale models for financial, legal, and clinical/medical markets are potentially viable. As adjuncts to existing trusted repositories, it’s possible to see how the creation and licensing of such models such that they could be plugged into existing or emerging platforms would be viable, significantly where they would gain the headline-grabbing skills that their LLMs cousins bring to the table (e.g. compilation and summarization).

Avoiding the parlor trick?

As we noted last year, it’s vital for generative AI to avoid becoming a mere “parlor trick,” and as approaches from work-focused software platforms iterate at pace, the more it feels like that potential is becoming less likely. We’re relieved that this is the case. 

The underlying complexity that both grounding/RAG and whatever-we-end-up-calling-these-small-models bring to their respective business scenarios should not be underestimated, even if they appear at first glance to be tied with a neat bow (unstructured data, hint hint). 

As always, we’re here to help. If you want to chat, just shout.

Robot image via Microsoft Copilot Designer

Leave a Comment

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.

Work Intelligence Market Analysis 2024-2029