Using OCI Language (text analytics) to detect PII 🤫

OCI Language AI (text analytics) has the ability to detect PII from a string of text, this is particularly useful for the following use-cases:

Detecting and curating private information in user feedback 🧑

Many organizations collect user feedback is collected through various channels such as product reviews, return requests, support tickets, and feedback forums. You can use Language PII detection service for automatic detection of PII entities to not only proactively warn, but also anonymize before storing posted feedback. Using the automatic detection of PII entities you can proactively warn users about sharing private data, and applications to implement measures such as storing masked data.

Scanning object storage for presence of sensitive data 💾

Cloud storage solutions such as OCI Object Storage are widely used by employees to store business documents in the locations either locally controlled or shared by many teams. Ensuring that such shared locations don’t store private information such as employee names, demographics and payroll information requires automatic scanning of all the documents for presence of PII. The OCI Language PII model provides batch API to process many text documents at scale for processing data at scale.

Taken from – https://docs.oracle.com/en-us/iaas/language/using/pii.htm

I created a simple Python script that uses the OCI Language API to detect PII in a string of text and replace this with a placeholder (PII type), this can be found below and on GitHub.

It analyses the text contained within the texttoanalyse variable, if any PII is detected this is replaced with the type of data contained and the updated string is printed to the console.

Update texttoanalyse and compartment_id before running.

import oci

config = oci.config.from_file()
ai_language_client = oci.ai_language.AIServiceLanguageClient(config)
texttoanalyse = "my details are brendan@brendg.co.uk, I was born in 1981" # String to analyse for PII

batch_detect_language_pii_entities_response = ai_language_client.batch_detect_language_pii_entities( # Identify PII in the string
    batch_detect_language_pii_entities_details=oci.ai_language.models.BatchDetectLanguagePiiEntitiesDetails(
        documents=[
            oci.ai_language.models.TextDocument(
                key="String1",
                text=texttoanalyse,
                language_code="en")],
        compartment_id="Compartment ID"))

cleansedtext = texttoanalyse # Replace the PII in the string with the type of data it is, for example e-mail address
for document in batch_detect_language_pii_entities_response.data.documents:
    for entities in document.entities:
        cleansedtext = cleansedtext.replace((entities.text),("*" + (entities.type) + "*"))

print(cleansedtext)

Here is the script in action:

Using OCI Language (text analytics) to detect PII 🤫

Comments

Leave a comment Cancel reply

More posts

Automating VM Instance Configuration using Cloud-init 👷

Testing the speed of an OCI VPN connection using iperf3 🏎️

Using a Bastion to Securely Connect to a Windows VM 🔒

Run a Free Security Health Check of an OCI Tenancy 👩‍⚕️

Using OCI Language (text analytics) to detect PII 🤫

Share this:

Comments

Leave a comment Cancel reply

More posts

Automating VM Instance Configuration using Cloud-init 👷

Testing the speed of an OCI VPN connection using iperf3 🏎️

Using a Bastion to Securely Connect to a Windows VM 🔒

Run a Free Security Health Check of an OCI Tenancy 👩‍⚕️