By Isaac Madan, Co-Founder and CEO of Nightfall AI. Nightfall AI were shortlisted for the Best Security Solution for Data Management / Data Protection, and Best Use of AI in a Cloud Security Solution awards at The 2024 Cloud Security Awards.
Enterprises rely on hundreds of SaaS apps and cloud services to manage the flow of work. That means there are a multitude of opportunities for sharing information where it shouldn’t be shared, resulting in a sprawl of sensitive data.
For example, consider an employee who sends his or her database credentials to a colleague in a messaging app. While the company may deem the messaging app secure, the employee’s database credentials now exist in a new location. If a threat actor accesses the messaging app and finds the shared credentials, they can easily access customer data in the database.
While sharing sensitive information in a Slack message, a Zendesk ticket or a GitLab repo frequently creates headlines, it’s just one of many contributors to sensitive data sprawl. Sprawled data can also be leaked to shadow IT and third-party large language models (LLMs), which can compromise compliance with frameworks like PCI-DSS, HIPAA, SOC 2 and ISO 27001.
According to Verizon’s 2024 Data Breach Investigations Report, the number of breaches caused specifically by end-user error has skyrocketed from 20% to nearly 90%. Legacy approaches to controlling data sprawl simply can’t keep up with the amount of data modern enterprises manage, or with the intricate web of applications and cloud services they rely on. In the age of cloud and AI, companies must be able to detect, protect and secure their sensitive data, and AI is the best way to do this at scale.

Leveraging AI for Sensitive Data Detection
One of the primary reasons legacy security solutions struggle to prevent sensitive data sprawl is because it’s difficult to achieve control or even visibility of data across the ever-increasing numbers of applications and environments. Many tools were designed when companies managed a handful of apps — not hundreds.
As a result, they don’t provide comprehensive visibility into data stored in cloud environments, mobile devices or other remote locations, and they can’t effectively detect or protect sensitive data as it spreads across disparate cloud systems and repositories. It’s nearly impossible in the modern landscape to enforce consistent security policies and implement adequate access controls, leaving sensitive data vulnerable to unauthorized sharing.
An AI-powered security solution can address the visibility challenge by quickly detecting the presence of sensitive data, whether at rest or in use. This is especially important when you consider that enterprises must identify and track data movement across disparate SaaS-based apps like Slack and Teams, Google Drive, JIRA, GitHub and email — not to mention third-party LLMs like ChatGPT. With AI, companies can more quickly and accurately understand where sensitive data is and make the necessary remediations, should the data be shared in a manner outside of the organization’s policies or compliance frameworks.

Preventing Data Exfiltration
One of the biggest gaps in modern protection is the ability to adapt to dynamic, distributed environments. Modern work happens across SaaS apps, email, endpoints, custom apps and GenAI. The problem is that traditional DLP solutions weren’t built to effectively monitor the sharing of sensitive information across all of these applications, and it’s hard for them to keep up as such applications evolve. Consider OpenAI’s GPT-4o release, which now leverages multimodal AI. To prevent sharing sensitive data with this and subsequent large language models, DLP solutions must support use cases where image, video and audio files are involved.
Legacy tools often rely on outdated threat intelligence or need help to keep up with emerging cyber threats and attack vectors, making them less effective at preventing sensitive data sprawl. As we know, cybercriminals are playing a never-ending game of Whac-A-Mole — their methods shift as soon as a new attack vector emerges (like video and audio in LLMs) or we find a way to stop their latest attack scheme.
AI algorithms and advanced analytics can combat new attacks by quickly identifying, adapting and responding to emerging cyber threats and attack vectors in real time. By continuously analyzing and learning from vast amounts of data, AI can identify new patterns and indicators of compromise, enabling organizations to stay ahead of potential data breaches.
Furthermore, AI-powered security solutions can provide seamless integrations and interoperability with various systems and platforms, including SaaS-based business apps. These solutions can leverage APIs and connectors to gather and analyze data from different sources, ensuring comprehensive protection of sensitive data across distributed environments. By understanding these environments’ unique nuances and intricacies, AI algorithms can identify vulnerabilities, enforce security policies and prevent sensitive data from being shared — whether accidentally or as part of a nefarious endeavor.

Encrypting Data to Prevent Sensitive Data Sprawl
Redacting or anonymizing sensitive data in SaaS apps, email and custom apps isn’t always practical. There are instances where that information must be shared, such as when a new remote employee submits a photo of his or her Social Security card to HR to verify employment eligibility. However, sending this information over email or Slack can make it vulnerable to compromise if those systems get hacked. That’s where encryption comes in.
While data encryption isn’t new, traditional approaches don’t meet a modern enterprise’s needs. They often have low detection accuracy rates (re: false positives) and lack automated features, which interrupt productivity across the organization. Encryption tools that leverage AI use context-aware discovery to scan outgoing and at-rest emails to see if they contain sensitive data in text, images or files.
Security teams can leverage these tools and create company-specific policies to scan emails based on user categories, user groups, domains, detection rules and other specifications to get granular control over how and when sensitive data is shared.

Meeting Data Compliance Regulations
Regulations like GDPR, CCPA and HIPAA require stringent data protection measures to demonstrate control and accountability over sensitive data. Traditional DLP solutions aren’t enough to protect companies that must comply with these standards because they are unable to adapt to dynamic, distributed environments and have weak data encryption features, as detailed above. Just as importantly, they often overwhelm security and compliance teams with false positives because they can’t understand the context surrounding PII or filter non-patient-related health documents.
By leveraging advanced analytics and machine learning, AI-powered security solutions can provide real-time insights into data usage and access patterns, helping organizations proactively identify and mitigate compliance risks. Furthermore, AI can generate audit trails and reports documenting compliance efforts and data protection measures, enabling organizations to demonstrate accountability and control over sensitive data to regulatory authorities.
Modern data challenges demand modern solutions. AI-powered security tools are no longer a luxury but a logical step forward in protecting against sensitive data sprawl across distributed cloud environments. By embracing this technological shift, companies can minimize the risk of data breaches and safeguard their valuable assets while confidently using the business tools they need to remain productive.

