Skip to content
Search
  • HOME
  • about
  • Services
  • Blog
  • HOME
  • about
  • Services
  • Blog
CONTACT US

Do you know of a tool that can identify if any sensitive data exists in this data storage system?

This question comes up in various forms, for example, when an employee is stealing intellectual property from their employer or if there was a breach of medical records. Most recently, it was a colleague that asked, "Do you have any script that looks inside of a local directory for PHI or PII?" By PHI, we mean protected health information which is sensitive information regulated by the Health Insurance Portability and Accountability Act (HIPAA). PHI is a subset of personally identifiable information (PII). In addition to PHI and PII, companies have a duty to protect their intellectual property (IP) and many other types of data.

Unfortunately, there is no magic "find sensitive data" easy button. The reason that discovering sensitive data is challenging is that it must be characterized. That process depends on the nature of the data and how it is handled. The other challenge with discovering sensitive data is related to how and where it is stored. Sensitive data can be stored in USB thumb drives, computer hard drives, and cloud storage. Data can be stored in databases, proprietary file formats, and encrypted.

Certain data, like credit card numbers, social security numbers, and medical record numbers (MRN), have a very specific format. If we know the format, we can create a regular expression (aka "regex') to describe the format. Most digital forensic tools, programming languages, and command line tools, such as grep support regular expressions.

Data analysis visualization

Unfortunately, not all sensitive information can be uniquely identified by regular expressions. Take, for example, a post-operative physician's note. This will contain lots of medical jargon, medication names, and other words that are specific to the medical profession. Putting a medical record number on the document will make it easier to find using a regular expression, but what if it is a draft of the note and does not contain an MRN? We could search for medical terminology or medication names. Obviously, the better we understand the data we seek, the more effective our efforts to search for the data will be.

Obviously, a physician's note is very sensitive, but how can software differentiate between a physician's note and a public article in a medical journal? This is easy for a human, but how can this be done at scale across terabytes of storage? Typically, one will accept some false positives in the search results and manually sort through them, but additional technical solutions may be leveraged by those familiar with them.

At Lucid Truth Technologies, we create innovative technical solutions to solve these types of problems. We have used natural language processing (NPL) techniques, often powered by artificial intelligence, to perform document classification, document summarization, named entity recognition, and indexing to identify the data most relevant to your case or investigation.

Lucid Truth Technologies is here to help if you face a challenge like this, regardless of where the data may be stored. Contact us today!

MORE POSTS

A digital illustration on a dark blue background shows a central circle labeled “IP” with lines branching out to various device icons, including a laptop, smartphone, router, and smart home symbol. To the right, curved arrows loop through VPN and Proxy icons connecting to another smartphone. A silhouette of a detective with a magnifying glass inspects the network paths, and a faint scale of justice appears in the background, symbolizing legal oversight. The style is flat and modern, with clean white outlines.
Network Forensics

IP Address Evidence in Criminal Investigations

Read More »
May 26, 2025
Digital Evidence and Data Recovery

Digital Evidence Search Rules: Are Current Standards Constitutionally Broken?

Read More »
May 5, 2025
Digital Evidence and Data Recovery

Real-World Impact: Digital Forensics Case Studies and Conclusion

Read More »
April 14, 2025
Load More ...
A digital illustration on a dark blue background shows a central circle labeled “IP” with lines branching out to various device icons, including a laptop, smartphone, router, and smart home symbol. To the right, curved arrows loop through VPN and Proxy icons connecting to another smartphone. A silhouette of a detective with a magnifying glass inspects the network paths, and a faint scale of justice appears in the background, symbolizing legal oversight. The style is flat and modern, with clean white outlines.
Network Forensics

IP Address Evidence in Criminal Investigations

Read More »
May 26, 2025
Digital Evidence and Data Recovery

Digital Evidence Search Rules: Are Current Standards Constitutionally Broken?

Read More »
May 5, 2025
Load More ...
A digital illustration on a dark blue background shows a central circle labeled “IP” with lines branching out to various device icons, including a laptop, smartphone, router, and smart home symbol. To the right, curved arrows loop through VPN and Proxy icons connecting to another smartphone. A silhouette of a detective with a magnifying glass inspects the network paths, and a faint scale of justice appears in the background, symbolizing legal oversight. The style is flat and modern, with clean white outlines.
Network Forensics

IP Address Evidence in Criminal Investigations

Read More »
May 26, 2025
Digital Evidence and Data Recovery

Digital Evidence Search Rules: Are Current Standards Constitutionally Broken?

Read More »
May 5, 2025
Load More ...

our services

Background Checks and Open-Source Intelligence Gathering

Mobile Forensics

Cloud Forensics

Computer Forensics

Lucid Truth Technologies is a registered trademark of Kenneth G. Hartman Consulting, LLC
©2025. Lucid Truth Technologies.
Privacy Policy
Scroll to Top

Subscribe