Thursday, 22 November 2007

Bag is cat of out the the

Excuse the flat pack cliché title of this particular rant (Requires a little user assembly and a screwdriver). I’m reading the latest on the HMRC data handling cock-up with increasing frustration.

Wherever the disks are, I find myself thinking that if the agencies concerned checked down the back of the proverbial filing cabinet, the offending disks would be rapidly found. On the other hand, did some lazy slaphead say that they had burned off a couple of CD’s and not actually done what they said?

What really beggars belief is the Senior Management assertion that de sensitising the data would ‘cost too much’. Okay guys, here’s a heads up on data transfer. Most databases (In fact all the decent ones I ever came across) have a data import / export facility available to those who have sufficient access rights to the database application that manages the stored data. Although I'm a bit out of touch with these things, the menu options generally go something like this;

Select 'File' Menu
Select Import / Export from File menu dropdown
Select - Export from menu options
Select fields to be exported per record - a.k.a. 'Building a database query'
E.g. Salutation, First name, Surname, Street, Town, Relationship to parent / child, file number for cross reference. (Do not select 'All', only a chimp does that.)
Select export file format (CSV, Text, whatever)
Encoded / Not encoded - Yes / No
Password protect? - Yes / No or go straight to password option
Type in password
Type in password again in next field to confirm
Do you really want to do this? Yes / No
(Yes – now go and get a cup of coffee or have lunch while your machine merrily chunters away, you can even 'lock' your screen if you like)

Total time taken to select fields – Ten minutes maximum. The actual number of records would govern the processing time required, and a query of this nature would have a lot of records to go through. Record size, let’s be greedy and say 2kb. Ergo 25 million records will fit on 1 CD, no compression. For added safety, compress the file with a secondary password required to decompress the data. Actual processing time with compression, probably no more than three to four hours, maximum, maybe a lot less depending upon how fast their processors are and how wide the query has to cast its net to select the data. At any rate, selecting and compressing the data to a point where it can be securely burned onto a CD, four to five working hours. If you have a secure Intranet between Government agencies you could copy the file up to a secure Ftp location and e-mail the password(s) in an encrypted attachment to the desired recipient. Job done. Piss of piece.

Compressing 500Mb of data, password encoding it and burning the CD? one hour tops, and it’s a long time since I had to do this sort of number processing. My figures could be way too slow because modern machines and processors should be much faster than when I were a lad.

I've done this myself and seen it work on flat file and distributed relational databases. It’s the skill in building the database query that counts. Nothing needs to be 'sanitised', you just select the bits of the records you want to 'export' and run the query. Is anyone telling me that people tasked with the maintenance of such databases don't know this? You emphatically do not transfer whole data files unless you're migrating or porting a system across to a new operating system or hardware platform.

This is all pretty basic stuff when dealing with ‘secure’ data. I could ask why this wasn’t done within HMRC, but now that the metaphorical horse has bolted, what’s the point? Brewery, a, run, couldn’t, up, a, piss, in. Such are the benefits of 20-20 hindsight, yet why were people with such an obvious ignorance of secure data handling doing making these decisions in the first place?

Oh sod it. Mrs S and I no longer live in the UK. Why should I care? Just because I was once an IT consultant before ageism and fast track visa's put me on the scrapheap. Sod it. I care because HMRC have leaked my families details potentially for all to see, that's why.


Paulie said...

I don't think that it's as easy to manipulate data in files as large at that - it isn't quite the same as managing a spreadsheet with a few hundred rows.

But still, it's not *that* massive a job either....

Bill Sticker said...


Biggest database I ever worked on had just under a million records, and yes, I did over simplify the process in my little tantrum. On the big 'distributed' databases, building a simple query may require access to three or five data tables, but the general principles remain the same.

My point is, what were people who didn't understand the basic principles of the tools they use doing with them in the first place?



Old Man said...

Yeah, but nothing is accidental.

Please note that there is no real fuss about this, only politicians and the newspapers (and we all know where they are coming from).
The important thing is that "the People" are not complaining too much: they trust nuLabour.

Next step is the National Identity Register; this government has proved it can weather losing half the population's details. Now wait for losing the rest!

Related Posts with Thumbnails