Data Protection / Data Centric Security: Security and Privacy Concerns
 
/* The following article is extracted from the "Information Security Newsletter" published by the JUCC IS Task Force. */ 
   
 
Every two years the number of data in the world is multiplied by two1. Data do not remain in the corporate network; they are shared with outside collaborators. Thus, firewall protection will not be enough; data themselves have to be protected in order to remain secured during transfer outside the local network. Therefore, a data centric security is mandatory. 
 
Security Threats
 
With the rise of new technologies, new challenges appeared and data storage, access and process have evolved. Businesses do not have any more the full authority over data crossing their system. Data security cannot be limited anymore to firewall and network protection.
 
Nowadays, traditional data security models keep being used but they do not protect against new kinds of data breaches. In fact, they are efficient for small data that remain in a data center. These kinds of systems will build a perimeter around the data and secure it. Unfortunately, data are not small any more and they proliferate in uncontrolled manners. Data are exchanged with broad range of partners and in multiple ways: through mobile as well as cloud. Moreover, this system does not take into account the fact that 50% of data breaches are caused by people inside the perimeter. Firewall will not prevent them from altering data.  
 
University best target for Data breach2
 
Universities have been for a long time hackers’ favorite targets. In fact, they are well-known for using poor data security protocols which increase accidental loss and exposure. Moreover, huge amount of data are handled and most of them are sensitive as student’s personal and financial details.
 
Universities are not easy to protect, students bring their own laptops, many guests use the network, researchers share information with people worldwide. This leads to a large heterogeneous organisation.
 
Efforts should be targeted on most sensitive data. Control should be proportionated for each data. 
Big Data
 
Big data collects data from multiple sources. Unlike databases, big data cannot isolate data according to their types. Thus, two data combined need a higher level of security than individually. Data have to be protected before moving into big data.
 
Cloud
 
Cloud is a distributed and decentralized environment. Thus, all parties cannot be trusted. All queries and data shared have to be carried out securely. Sensitive data, which move into and across cloud, increase the risk of data loss or compliance violations. Sensitive data should be protected at the point of creation, before leaving the local network or getting to the cloud. Access-control policies and authentication should be enforced.
 
Personal Information
 
Personal information has to be protected. Some should be accessed only by specific persons. Traditional security techniques secure only a narrow range of threats. Attackers have to be dissuaded by de-identifying data at rest but also in motion or in use. The security level applied should be modified depending on the type of data.
 
Credit card payment controls are also not enough. In fact, credit card information passes through complex and hard to secure environment. As protection of the network will be hard to ensure, data themselves have to be protected.
 
According to Gartner’s analysts, more than 80% of organisations will fail to develop a consolidated data security policy across all their data silos by 20163.
 
Even IT usually ignore where sensitive data are located. In fact, 41% ignores where unstructured data are located and only 16% knows where all data are4. Thus a protection of the environment (network and end points) will not be efficient; data centric security is needed.
 
Data-Centric Security
 
A data-centric approach has to be used to avoid these new kinds of threats. For this purpose, structured and unstructured data have to be encrypted from the first point of entry. Thanks to that, they will remain encrypted regardless of their evolution. In fact, even if data are outsourcing afterwards, they will remain encrypted. This security architecture will empower organisations as data proliferation will not have to be limited anymore. Data-centric security has to define unique rules for different data. Data will have to be encrypted for certain users and even blocked for some others.
 
Data-centric security has several advantages:
 
Sensitive data are protected
 
Sensitive data are masked before leaving the organisation. Thus, even if they leak, they will not be readable.
 
Analytics is enabled on protected data
 
Data are de-identified but the format is preserved. In fact, a birthday still looks like a birthday and thus, analytics can be done without deciphering data.
 
Data are secured in motion, at rest and in use
 
Data are secured across the different environment. The data themselves are protected instead of limiting security to a predefined perimeter.
 
Data are devalued to cyber-attackers
 
In case of data breach, data will remain encrypted and it will be more expensive for hackers to succeed in their attack. If the cost is too high, hackers will be discouraged and will not even attempt an attack.
 
Reduces risk and delivers regulatory compliance
 
Data exposure is dwindled in the case of a breach. Data-centric security insures a certain compliance with data privacy regulations.
 
Data protection Data Protection Tools regulation in APAC5
 
Data privacy regulation is the fastest improving area of regulation in Asia. Several countries are adopting new data privacy regulations closer to the European ones. Penalties for noncompliance are also increasing. However, their jurisdiction remain less tight than the European one except for New Zealand.
 
In 2012, Hong Kong adopted a series of law concerning data privacy; a radical change after nearly 15 years without policy modification. This amendment increases dramatically penalties. For instance the maximum penalty for a malicious disclosure of personal data without consent reaches now 1 million HKD and 5 years of jail.  
 
 
Advanced Persistent Threat (APT)7
 
An APT is an attack in order to steal sensitive data. The attacker gains access to the network and remains there as long as he is undiscovered which can represent a long period of time. Therefore, he will have to hide his presence by using high level of encryption. He will try to get administrative credentials to gain access to all the data.
 
To limit the damage caused by this kind of attack, firewall and standard network security are not enough. A data-centric security is needed.
 
In January 2010, Google has undergone a cyber-attack conducted by APT. It belongs to the Operation Aurora: a series of cyberattacks targeting 34 companies. The attack began mid-2009 and was discovered only in December 2009.  
 
Data Protection Tools
 
Specific tools have to be used for data centric security to overcome new security challenges6. They can apply either as the principle security control or as a supplement to an existing security control.
 
Tokenization
 
A token is used to replace sensitive data. For instance, it can be used in a credit card processing system. The token would be 16 digits long as the credit card number and all original digits except the last four would be substituted by 14 other random numbers. There is no mathematical relationship which increases the security level. The original value will be stored in a more secure database. The substitution is done in-line with the transaction system, thus, data are modified in motion and application functionality is preserved. Data in motion or at rest can be tokenized. Token appearance is customizable and can even be applied to complex data.
 
The main disadvantage of this tool is the storage of the original value in another repository. Moreover, tokens also retain aggregate value of data set. Finally they can only be created statically.
 
Masking
 
Masking can also substitute value but in a more complex manner. In fact, instead of using a random number, it will remove the sensitive value while maintaining important information. For instance, a date of birth will be modified without altering the age of the person. Different masks can be applied to a same data depending on use cases. This flexibility limits risk exposure.
 
There are two kinds of data masking: persistent data masking and dynamic masking. The first one applies masks on stored data. This allows de-identification of data for analysis or sharing outside the company. For dynamic masking, masks are applied on request, in real-time, altering delivered data instead of stored data.
 
Data masks without mathematical relationships are completely secured. For the other ones, implementation should be preceded by an in-depth reflection, in order to avoid information leakage. In fact, organisations have to monitor that a mask does not provide too many pieces of information to allowed reverse engineering and thus expose the original data. Only a wellreasoned use of masks can provide high level of security.
 
Encryption
 
Encrypted data cannot be deciphered without the encryption key. It is only thanks to the recent technological progress that encryption can be applied to data. In fact, encryption ciphers were first designed for files and not for data. Then a binary format encryption appeared, but protected data could not be treated without being deciphered. Nowadays, encryption can be applied to all kind of data and be processed by application. In order to limit access, encryption keys can be delivered only to specific users.
 
However, encrypted personal information is not accepted in all countries. Although, encrypted data are directly compliant in the United States of America, European Union limits their access. In fact, unlike tokenized and masked information, some encrypted personal data are not allowed to pass EU boundaries. Moreover, without special attention on key management and source legitimacy, the entire model fails. 
 
Homomorphic encryption is a method that allows analysis of encrypted data without deciphering. Unfortunately, it is still at the draft stage. It requires high computing resources which are economically not viable and security is compromised during analysis.
 
Comparison
 
Each tool has to be applied for a specific type of data. Thus, before choosing a security tool, organisations should determine which kind of data as to be protected. The following chart summarizes which datacentric tool suits best for each kind of data.
 
Deployment Model
 
These tools have to be implemented and deployed in order to ensure data-centric security. Three kinds of implementation are available8.
 
Gateways
 
Gateways apply security on data flows, in real-time. It is generally located in the corporate firewall and is used to filter sensitive data before they left the organisation network. This technology is especially suitable for data in motion. However, to work properly, they have to support high velocity input (especially if they are placed just in front of big data clusters).
 
Hub and Spoke
 
It is an advanced form of the traditional ETL (Extract, Transform and Load). ETL generally begins with the extraction of data from an internal database, then data are obfuscated and finally loaded into another database. The most advanced process can transform data from different sources to create a new data set and transfer the result in a new database or even in several data destinations. Hub and spoke is an ETL with automated capabilities and a more sophisticated data management system. In fact, information is moved automatically according to policy. It is used to secure data remotely and not on the stream as the previous one.
Reverse proxy
 
Security is applied inline into the data flow. Reverse proxy is located between users and a single database and captures transiting communications. Thanks to that, data can be modified without altering the database. Thus, additional programming is not mandatory. Moreover, they can modify results according to the recipient. Generally, the data-centric tool associated to this deployment model is masking. In fact, it is also called dynamic masking. Reverse proxy is mainly used to protect data dynamically.
Conclusion
 
With the appearance of sensors, and smart tools, organisation will create more and more data. Globalization makes data more mobile. Therefore, it is paramount to be able to share and store them safely. Data centric security will protect data without dwindling mobility, and it will increase global collaboration. 
 
References  
  1. “Data Growth, Business Opportunities, and the IT imperatives” April 2014 Web. 4 August 2015
  2. “Universities cyber-attacks research criminals” March 2015 Web. 5 August 2015
  3. “Gartner“ September 2014 Web. 3 August 2015
  4. “The State of Data Centric Security” by Ponemon institute LLC June 2014 pdf
  5. “Data Protection Regulation in the Asia Pacific: Trends and Recent Developments” November 2013 Web. 5 August 2015
  6. “Streamingly information Protection Through a Data-centric Security Approach” by Voltage May 2013 pdf
  7. “Data-centric Security: A New Information Security Perimeter” by J. Oltsik March 2015 pdf
  8. “Trends in Data Centric Security” by Securosis September 2014 pdf