Skip to content
Search
  • HOME
  • about
  • Services
  • Blog
  • HOME
  • about
  • Services
  • Blog
CONTACT US

Do you know of a tool that can identify if any sensitive data exists in this data storage system?

This question comes up in various forms, for example, when an employee is stealing intellectual property from their employer or if there was a breach of medical records. Most recently, it was a colleague that asked, "Do you have any script that looks inside of a local directory for PHI or PII?" By PHI, we mean protected health information which is sensitive information regulated by the Health Insurance Portability and Accountability Act (HIPAA). PHI is a subset of personally identifiable information (PII). In addition to PHI and PII, companies have a duty to protect their intellectual property (IP) and many other types of data.

Unfortunately, there is no magic "find sensitive data" easy button. The reason that discovering sensitive data is challenging is that it must be characterized. That process depends on the nature of the data and how it is handled. The other challenge with discovering sensitive data is related to how and where it is stored. Sensitive data can be stored in USB thumb drives, computer hard drives, and cloud storage. Data can be stored in databases, proprietary file formats, and encrypted.

Certain data, like credit card numbers, social security numbers, and medical record numbers (MRN), have a very specific format. If we know the format, we can create a regular expression (aka "regex') to describe the format. Most digital forensic tools, programming languages, and command line tools, such as grep support regular expressions.

Data analysis visualization

Unfortunately, not all sensitive information can be uniquely identified by regular expressions. Take, for example, a post-operative physician's note. This will contain lots of medical jargon, medication names, and other words that are specific to the medical profession. Putting a medical record number on the document will make it easier to find using a regular expression, but what if it is a draft of the note and does not contain an MRN? We could search for medical terminology or medication names. Obviously, the better we understand the data we seek, the more effective our efforts to search for the data will be.

Obviously, a physician's note is very sensitive, but how can software differentiate between a physician's note and a public article in a medical journal? This is easy for a human, but how can this be done at scale across terabytes of storage? Typically, one will accept some false positives in the search results and manually sort through them, but additional technical solutions may be leveraged by those familiar with them.

At Lucid Truth Technologies, we create innovative technical solutions to solve these types of problems. We have used natural language processing (NPL) techniques, often powered by artificial intelligence, to perform document classification, document summarization, named entity recognition, and indexing to identify the data most relevant to your case or investigation.

Lucid Truth Technologies is here to help if you face a challenge like this, regardless of where the data may be stored. Contact us today!

MORE POSTS

Digital illustration for blog post “Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations.” The image shows a white cloud with the BitTorrent logo raining digital raindrops onto a laptop displaying a forensic badge icon, set against a blue background with network node patterns. Represents Torrential Downpour BitTorrent evidence, P2P forensics, and digital investigation concepts for Lucid Truth Technologies.
Network Forensics

Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations

Read More »
November 10, 2025
A digital illustration in blue tones depicting the intersection of technology and law. A laptop on the left displays lines of network data and IP addresses. Behind it, an abstract eye icon represents surveillance, and faint IP addresses are scattered across the background. On the right side, legal symbols including a gavel, courthouse, and official document icon appear subtly. The main title reads “SUBPOENAS, PEN REGISTERS, AND IP ADDRESS LOOKUPS.”
Network Forensics

Subpoenas, Pen Registers, and IP Address Lookups

Read More »
October 20, 2025
A split-panel illustration shows the contrast between a defense attorney and a forensic expert. On the left, the defense attorney in a navy suit speaks confidently at a courtroom podium, symbolizing advocacy and due process. On the right, the forensic expert in a white lab coat examines a smartphone and works at a computer displaying a digital fingerprint, symbolizing impartial technical analysis. Between them, Lady Justice appears blindfolded and holding balanced scales, representing fairness. The background subtly blends courtroom and laboratory settings, with neutral tones of navy, gray, and white.
Legal Strategy

Defending Criminals: Are Defense Attorneys, Investigators, and Experts Working for the Dark Side?

Read More »
September 29, 2025
Load More ...
Digital illustration for blog post “Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations.” The image shows a white cloud with the BitTorrent logo raining digital raindrops onto a laptop displaying a forensic badge icon, set against a blue background with network node patterns. Represents Torrential Downpour BitTorrent evidence, P2P forensics, and digital investigation concepts for Lucid Truth Technologies.
Network Forensics

Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations

Read More »
November 10, 2025
A digital illustration in blue tones depicting the intersection of technology and law. A laptop on the left displays lines of network data and IP addresses. Behind it, an abstract eye icon represents surveillance, and faint IP addresses are scattered across the background. On the right side, legal symbols including a gavel, courthouse, and official document icon appear subtly. The main title reads “SUBPOENAS, PEN REGISTERS, AND IP ADDRESS LOOKUPS.”
Network Forensics

Subpoenas, Pen Registers, and IP Address Lookups

Read More »
October 20, 2025
Load More ...
Digital illustration for blog post “Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations.” The image shows a white cloud with the BitTorrent logo raining digital raindrops onto a laptop displaying a forensic badge icon, set against a blue background with network node patterns. Represents Torrential Downpour BitTorrent evidence, P2P forensics, and digital investigation concepts for Lucid Truth Technologies.
Network Forensics

Torrential Downpour and BitTorrent Evidence – A Forensic Perspective on P2P Investigations

Read More »
November 10, 2025
A digital illustration in blue tones depicting the intersection of technology and law. A laptop on the left displays lines of network data and IP addresses. Behind it, an abstract eye icon represents surveillance, and faint IP addresses are scattered across the background. On the right side, legal symbols including a gavel, courthouse, and official document icon appear subtly. The main title reads “SUBPOENAS, PEN REGISTERS, AND IP ADDRESS LOOKUPS.”
Network Forensics

Subpoenas, Pen Registers, and IP Address Lookups

Read More »
October 20, 2025
Load More ...

our services

Background Checks and Open-Source Intelligence Gathering

Mobile Forensics

Cloud Forensics

Computer Forensics

Lucid Truth Technologies is a registered trademark of Kenneth G. Hartman Consulting, LLC
©2025. Lucid Truth Technologies.
Privacy Policy
Scroll to Top

Subscribe