In my last blog post – concerning the importance of not assuming that AI will just find a suitable use case in your organization – I also mentioned that the soon-to-arrive 2024 update of the Work Intelligence Market Analysis will this time have some additional data points within it, concerning geographic location and company size.
As this report is very much the child of its primary writer, one of the sections that I spend the most time worrying about, is perhaps the bit that the fewest people actually read in detail; the methodology. The reason for this is that it’s really important to ensure that whatever research we publish is credible and defensible. That credibility is in part based on the weight of collective research and allied experience that we bring to each publication, along with the details of the number of companies within the research, volume of data points etc.
It’s the defensibility of the research though that most keeps me focused. This is the need to ensure that if any claim is challenged, that we have the data to back up what we have published to such an extent that the claim can be sustained. For example, if someone wanted to query the growth rates within this report, we’ve got a detailed model which shows how this is calculated at a sub-market level over the time period examined, so it’s possible to see how the math checks out.
For other areas, it’s not just simple math that is required to understand what the data means, and that brings me onto the biggest update in the methodology section of the new report that comes with the introduction of initial geographic and company size data into the formal analysis. If you will indulge me a little, here’s the challenges with each;
On the face of it, adding a new data point which determines where each vendor within an examined market place is based, is easy. You look at where the HQ is, you record it and – as we have done in our data – then you band it into a specific region (e.g. Europe, North America etc.). For most normal people, that would be a relatively trivial task, however I am not normal and as a result, the simplicity hid something that I needed to record as a significant caveat in the use of that data.
The primary issue is that the location for each vendor is recorded as their self-declared HQ. In short, if you say you are headquartered in NYC, then we record you as USA, North America in our data. As this data is collated en masse, we can draw out regional splits by market and sub-market and make broad claims that say “x% of this market are vendors based in North America”. Indeed, in the data it shows that across the sub-markets within Work Intelligence, a range of between a low of 45% (Work Engines) and a high of 75% (Programmatics) of the vendors are based in North America, which when combined gives a number of 54% across Work Intelligence as a whole.
So why isn’t that enough for me? I mean I know the data is accurate based on that self-identification, so why am I being weird about it? It’s because that self-identification obscures a great deal about the narrative of those vendors. Here’s just two points that should help you understand what I mean.
Within the data set we have some vendors that have been around for 12 months, some for 12 years and a handful for over 50 years. In the lives of the companies, where the HQ is located has changed as they have grown and matured. For example, take UiPath which is now very much a US company, with an HQ in NYC, listed on the NYSE and is recorded as such in the data. That it is also a great Romanian start-up success story is therefore obscured in the data, even though those like us who have followed their story to date, are well-aware of their proud backstory in central Europe.
Companies migrate their HQs across continents for all sorts of reasons; to help raise money, to be closer to their bigger, faster growing customer bases or as part of their listing on financial markets. The result of this is that in isolation you cannot use this data to draw conclusions about say, the strength of an emerging market by geography using this data in isolation, you’d need other data points, such as founding date in order to make that analysis defensible.
The second point to also bear in mind is that the HQ data also often belies where the majority of employees might be located. This is an important data point that pretty much anyone who is building a proper market sizing needs to know, as that will help understand the approximate cost base of operations. There are a great number of vendors within this – and I’d wager, every software market – who have a significant chunk of their development team located in a different country or continent to that self-declared HQ. So for example, to understand the size of an industry by employees, you’d need a great deal more than the geodata that we’re currently making available.
Company Size Data
This brings us neatly on to the other new set of data, that of company size. We’re recording this on full-time employee numbers and then banding that data into blocks (1-50, 51-200, 201-1000, 1001-5000 and 5001+). As we’ve already discussed, where those employees are located is an important bit of data that doesn’t show up when we start to analyze the result here. However, there’s a couple of additional factors that you need to bear in mind when you’re extrapolating this data; size of company and product focus.
In the new 2024 Work Intelligence data, we can see across the market 42% of vendors are in the 1-50 band. These are small, single product companies, where we can say with a strong degree of confidence that each one of those people is contributing toward Work Intelligence revenue. Where this gets more difficult is when we look at the larger bandings. In that same data 8.7% of vendors are in the 5001+ band and no vendor in that banding is a single product company and vendors there are not always solely contributing to the Work Intelligence revenue number (being engaged in other contiguous markets).
We are of course already managing revenue splits (both estimates and actual where available) within these large vendors and while we don’t publish this data explicitly on an individual basis, it all contributes to revenue numbers you see in the report. With employees, it is much more complicated to build a reliable estimated proportionate model for all sorts of reasons (boring ones, but feel free to ask me about them, as I have no fear in boring people), so we don’t split them out. This means what we are stating vendors by overall size, not the Work Intelligence proportion alone.
Additionally, while we include that broad banding within the new report, we do not break that out by sub-markets, even though we have that data (and indeed, I have the data in chart form in front of me right now). This is because of the danger of comparing sub-market to sub-market and therefore double counting the employees in doing so. Comparative analysis requires a safety net that you should always assume isn’t there, unless you’re told expressly that it is.
If you read through that, then well done. If you glossed over it and skipped to the end, I don’t blame you, life is far too short to get into the weeds of this with me right now. So here’s the short version;
– We collect a lot more data that we use for analysis and only introduce new bits when we’re happy it is sound, we’ve thought about the implications and can guide you as to how to use it
– We’re always adding to the data set with new data points which may or may not get used in the future, and we’ll never just dump data into our reports without painstakingly talking it to death as a team first.
– We take this care when we’re building any data product, so for example when we built our all-new Economic Spectrum Report model in 2023, it’s got the same degree of efficacy within the underlying design as you’d expect (if you’d like to see one out in the wild, here’s the first one that we created for Hyland Software).
If you’d like to discuss this report in more detail or the recently released 2024 IDP Market Report, then please get in touch. We love talking data.