February 8, 2018

Guest blog: AI experiment, phase 1: Helping artificial intelligence

Tomas Lehtinen

Project Manager, AI experiment, the Six City (6Aika) project in Espoo, City of Espoo

The City of Espoo and Tieto launched a unique artificial intelligence experiment last summer.

The experiment  combines a huge amount of data, including social and healthcare services as well as early childhood education data for the whole population of Espoo for the period 2002-2016. The aim of analysing the large data mass is to find new predictive ways to target services to residents, for example, in terms of prevention of social exclusion.

So what is wrong, as no results have been achieved yet? I will explain the challenges we have faced and how we have resolved them in this blog and another two blogs that I will publish later this year. That is, how we have prepared for an artificial intelligence experiment of such an enormous scale so that we could ensure it is effective and beneficial. As we are dealing with social and healthcare data from more than half a million people, there is no margin for error – we must get it right first time.

We introduced artificial intelligence to the analysis of the data due to its efficiency and data security. Artificial intelligence can handle large amounts of data significantly faster than human beings. It takes about a month to train artificial intelligence to carry out a task, and then the computer comes up with results in mere hours or even minutes. But before the computer gets to do its actual job, i.e. to analyse the data, human brains are needed in the preparatory stage.

Technical and operational data security working together

Data protection is applied in the experiment by various means, including technical and operational information security measures.

Whether for experimental or production purposes, when personal data is processed, technical and operational information security must be applied at all stages: development, testing and production environments.

One-off search à pseudonymisation à anonymisation

How was the data collection organised? The source systems were searched using various query languages ​​and supporting programming languages. Personal data, social status, information on health and other variables were collected in the searches. Data was collected only for a clearly defined period of time.

After the data collection, the data was pseudonymised, i.e. processed so that it could no longer be associated with a particular person without using an encryption key.

How on earth do keep you track of the data if all sensitive personal data is pseudonymised? The whole idea of finding out the service pathways is to monitor how individuals have progressed through different services. So the answer is: by using encryption keys. The registers used in the project were merged using hash values from the personal identity codes, as a given personal identity code gives the same hash value when recomputed. We know that it is the same person, but we do not know who it is.

In order to take data protection one step further, the pseudonymised fields were anonymised by calculating a new hash value with a randomly selected string after combining the registers. Anonymisation removes the repeatability of the hash values, so that the same hash value cannot be calculated even when using a known personal identification code.

We set strict requirements for encryption. Dormant or active data was encrypted in a data lake and a proxy server transfer directory at the level of standards set by Espoo whenever the data was raw data or pseudonymised data. Data encryption or encryption of data communications could not be decrypted until the data had reached the target system.

People are the biggest data security risk

People, with their human errors, are usually the weakest link when data security is threatened, which is why we also prepared ourselves for this eventuality with care.

The administrators had to identify themselves when they were collecting data, their login details had to be personal, and there had to be a record of every login and extension of user rights.

We monitor and observe data security all the time. This is done by means of technical monitoring, i.e. automation, as well as in the data lake and the transfer servers.

Simple, isn't it? It does not necessarily sound like it − except to data protection specialists. Sometimes the project manager almost gets lost, but fortunately, there is a team of professionals who understand and explain things to him. Neither artificial intelligence nor the project manager can do it alone because innovation is a team effort. If we did not put our human brains to work together to come up with new ideas, artificial intelligence would be nothing but a useless snippet of code.

In my next blog I will tell you about how the quality of data mass is technically controlled, for example, how well our data can be used in terms of quality and structure.

This blog was originally published here.

Stay up-to-date

Get all the latest blogs sent you now!