Autoclassification – where it works, and where it doesn’t (seem to yet).

I’ve been digging into auto classification lately.

It’s one of a number of very promising technologies that will help Records Managers clean up after users who don’t do what they should, and further reduce the burden of users operating in really complex situations.

It’s very promising. The technology works.

The main bar it had to jump for me is beating human accuracy.

The numbers I’ve seen show that people are around 65% accurate when classifying. The numbers I saw at a recent demonstration showed that the Auto-Classification algorithms confidence level on recommendations was at about 80%. This demonstration was from an Australian government agency of a system in active use – so this is real world accuracy.

While it’s an excellent tool. What I haven’t seen yet is its application to the problem of organising content in ways that are useful for people – making content “findable” rather than just well classified.

What I mean by this, is that it is excellent at applying a classification. “This is a development application”. 

Where there doesn’t seem to be an answer yet is in the area of understanding context, and relating one object to another. It seems to be good at saying “this is a development application”, but not at saying “this is the development application for 30 Reed Street Bristol, and these are all of the documents related to that development application” – which means that it can classify, and be useful for sentencing, but not file. 

It seems for the moment that we’re stuck with filing as the simplest way of showing the relationship between content.

While I think auto-classification is still a meaningful and useful improvement that can reduce compliance risk, I don’t think that reduction is as meaningful as it seems. 

My current read on the technology is that the best use is in manage in place scenarios, where we can tag the content with a classification and leave it the way a creator/user organised it to provide context. Without that context, I think we’re creating a problem for ourselves later. We’ll have better sentencing from a subject standpoint, but complying with instruments that require us to understand more context will be difficult. I think it has the potential to make it harder to comply with subpoenas and open government style requests (FOI, GIPAA, RTI, SAR etc.). 

On balance, I’m an advocate. I think that for lots of organisations it will take them from a place of poor compliance, to a place of meaningful compliance, and enable better lifecycle management. Until we have tools that can assess and understand context however, it should be adopted in addition to really good, usable file plans, not as a replacement of them.

The economics of Information Management and Records Management are different – so we should treat them differently.

The value of what we do in people’s hearts and minds is totally dependent on how they understand it.

Records management is about evidence of compliance.

Information management is about driving organisational performance by using information.

That sounds simple, and it is, but they have very different future values, and very different investment constraints.

Getting records management wrong can put you out of business quickly. It can also tie you up in audits, and get you on the wrong side of regulators that can make doing business hard.

The future value of getting records right is being able to stay in business (or retain the trust of the public). When we do Records projects, we’re either bringing ourselves into compliance, or reducing the costs associated with compliance.

Getting information management wrong can also put you out of business – but it’s going to do it slowly.

Poor information management puts you out of business slowly because you lose to competitors who are achieving superior value from their information.

Too often we put them together like they’re the same, and they’re not. 

The economics are totally different. 

We stop doing Records Management projects when we’re compliant, and we can’t meaningfully reduce the costs of compliance. The constraints are the risks of non-compliance, and costs of being compliant.

We stop doing Information management projects (and knowledge management projects) when the organisational performance gain will no longer be sufficient to pay for the project. The constraints are the available market, the availability of capital, and the ability of the organisation to absorb change.

Too often I think we put the disciplines together. There are large skill overlaps, and obviously every record is comprised of information, but they are different. If we don’t think about them differently, talk about them differently, and have a shared understanding with our executive and staff about the difference, we can’t expect that they’ll be valued differently.

So it’s either all compliance – and to be avoided, or all value delivering – and the average of both, so neither gets their full value in people’s hearts and minds.

Does your organisation have a measure of “records debt?”

There’s a commonly used concept in software development called “technical debt”.

It describes all the bad and dodgy (or maybe just sub-optimal) coding that was done in the past, that will need to be cleaned up later.

I think it’s a concept that we can usefully apply to Records management.

In record keeping, we need to measure two things –

  1. Direct costs – Cleanup costs and the costs of holding until clean up.
  2. Indirect costs incurred because records are no longer reliable.

Cleanup is the cleanest measure. Projects for cleanup work can be estimated with relative certainty as long as you know the information exists.

Indirect costs are obviously harder.

They include the costs of contract renewals missed or poorly enforced because their records are not managed. The generally higher costs of subpoena and information access responses, and generally reduced productivity due to increased search costs for specific records.

Records debt is a useful concept that we can take from software engineering, and a useful barometer that could be presented to executives on a regular basis.