MySQL obfuscated database under online replication

Ekaterina
Sinkovets

Actually, there are secure restrictions to see client / user / etc. PII (Personally Identifiable Information) data from inside and outside of company (one of restrictions is for example GDPR law). Ekaterina made a research how to restrict data inside company’s architecture for MySQL database.

So, there are a lot of technical solutions how to hide PII data (or another word how to take obfuscated data).

One of solution looks like below.

or another way is to make a copy of production database every night by schedule and then apply SQL script(s) to obfuscate sensitive data.

Both solutions are good but there are underwater stones.

I am trying to highlight positive and negative moments for each presented solution.

1st solution

PositiveNegative
We have obfuscated dataThere are several components in architecture
Obfuscated DB has minimal time delay from real data We have to support each component in data pipeline
End users are working with separated obfuscated node and have not impact on production DatabaseWe have to monitor each component
{MySQL BinLog, Debezium, Kafka, Microservices, Obfuscated DB}
If obfuscated rules will change (for example one column will not under PII at Monday) then we need destroy Obfuscated DB and create again from a scratch and catch up to real data
To realize this schema we need an additional budget for servers and man hours

2nd solution

PositiveNegative
We have obfuscated dataObfuscated DB has huge time delay from real data
End users are working with separated obfuscated node and have not impact on production DatabaseIn case of failure of creation a new Obfuscated Node at night we can lost time for end users during a day
There are not a lot of IT components here and we can save our budgetSQL’s script needs a support and testing every time when rules will change for transformation PII data.
If obfuscated rules will change (for example one column will not under PII at Monday) then we need waiting a new Obfuscated DB on the next day or run process again during a day

Ekaterina suggested a different way to present PII data. The solution contains a few terms are

  • ANSI SPARC Architecture
  • Virtual Tables (Database Views) with PII transformation formula for needed columns
  • Hidden Relational Model to support PII rules for MySQL Event Scheduler
  • MySQL Event Scheduler to sync views with modified tables during Release
  • Strong grants and permissions through database roles for end users.

we can simplify a topology model to next architecture.

Leave a Reply