Autoclassification – where it works, and where it doesn’t (seem to yet).

I’ve been digging into auto classification lately.

It’s one of a number of very promising technologies that will help Records Managers clean up after users who don’t do what they should, and further reduce the burden of users operating in really complex situations.

It’s very promising. The technology works.

The main bar it had to jump for me is beating human accuracy.

The numbers I’ve seen show that people are around 65% accurate when classifying. The numbers I saw at a recent demonstration showed that the Auto-Classification algorithms confidence level on recommendations was at about 80%. This demonstration was from an Australian government agency of a system in active use – so this is real world accuracy.

While it’s an excellent tool. What I haven’t seen yet is its application to the problem of organising content in ways that are useful for people – making content “findable” rather than just well classified.

What I mean by this, is that it is excellent at applying a classification. “This is a development application”. 

Where there doesn’t seem to be an answer yet is in the area of understanding context, and relating one object to another. It seems to be good at saying “this is a development application”, but not at saying “this is the development application for 30 Reed Street Bristol, and these are all of the documents related to that development application” – which means that it can classify, and be useful for sentencing, but not file. 

It seems for the moment that we’re stuck with filing as the simplest way of showing the relationship between content.

While I think auto-classification is still a meaningful and useful improvement that can reduce compliance risk, I don’t think that reduction is as meaningful as it seems. 

My current read on the technology is that the best use is in manage in place scenarios, where we can tag the content with a classification and leave it the way a creator/user organised it to provide context. Without that context, I think we’re creating a problem for ourselves later. We’ll have better sentencing from a subject standpoint, but complying with instruments that require us to understand more context will be difficult. I think it has the potential to make it harder to comply with subpoenas and open government style requests (FOI, GIPAA, RTI, SAR etc.). 

On balance, I’m an advocate. I think that for lots of organisations it will take them from a place of poor compliance, to a place of meaningful compliance, and enable better lifecycle management. Until we have tools that can assess and understand context however, it should be adopted in addition to really good, usable file plans, not as a replacement of them.

2 thoughts on “Autoclassification – where it works, and where it doesn’t (seem to yet).”

Leave a comment