By Kumar Goswami is the CEO and Co-founder of Komprise, shortlisted for Best Cloud Data Management Solution at The Cloud Awards 2022-2023

The biggest buzz in the tech world in late 2022 was an application called ChatGPT by OpenAI. ChatGPT, which responds to natural language requests and can quickly create articles or answer complex questions, has been anecdotally helping students write essays, write letters of recommendation for professors and even solve coding issues. AI tools like ChatGPT, while still nascent, may soon revolutionize the way we create content or solve technical challenges –with astonishing accuracy and relevance in minutes.

This kind of technological innovation, while experimental and unproven, is mind-boggling, to say the least. In business, AI has also been making impressive inroads over the past two years. Take this example from McKinsey:

“In one metals manufacturing plant, an AI scheduling agent was able to reduce yield losses by 20 to 40 percent while significantly improving on-time delivery for customers.” These are results that can give a big boost to the top line while reducing the bottom line and improving customer retention.”

Rapid AI progress leads to some observations on the future of data management. With the right solutions, organizations can save at least 60% on storage, backups and disaster recovery from managing unstructured (file and object) data more efficiently across hybrid cloud storage through analytics and intelligent data tiering. But this is only the beginning. The enormous opportunity at hand is to fully leverage unstructured data – files and objects – for use in AI and ML engines.

Why this focus on unstructured data?

Traditional business intelligence (BI) tools have relied upon the structured data in data warehouses to do their analysis, but most of today’s data is semi-structured or unstructured: think PDFs, productivity files, research data, email and text, images, video and audio files and sensor data. It is this massive data set—at least 80% of the estimated total global 97 zettabytes (ZB) worldwide in 2022—that’s needed to fuel AI and ML applications.

There is much to yet understand about the potential for AI and its impact on not only work and economic output but our personal lives. Enterprises need to be ready for this wave of change – and it starts by first getting a complete picture of the unstructured data often locked in storage silos and disconnected file systems across the enterprise.

New data management technologies and strategies will enable the creation of automated ways to index, segment, curate, tag and move unstructured data continuously to feed AI and ML tools. Unforeseen changes to society, fueled by AI, are coming soon and you don’t want to be caught flat-footed. Is your organization ready?

To take advantage of the AI/ML innovation landscape, here are a few key practices to start you on your unstructured data management infrastructure journey

Get full visibility so you can optimize and leverage your data

Organizations often don’t have the full picture of their unstructured data, which leads to the fact that most data behind the firewall is not used much less leveraged for competitive gain. IT leaders and other data stakeholders don’t know which data is the most valuable in terms of access frequency or ownership, or where there are hidden silos of unused data eating up expensive storage. For example, a large percentage of data could be moved to cheaper storage based on usage.

Organizations typically actively use only 20 percent of the data they have in storage. Therefore, the rest of it can go to deep archives or to warm tiers in the cloud that are less expensive. Of course, some of it can be deleted altogether. With an analytics approach to data management, IT leaders can start implementing a nuanced strategy that considers current and future data value. The first step is to recognize your current situation and find ways to move from a storage-centric to a data-centric approach.

 If you aren’t indexing your data today, that’s a problem

A seminal barrier to data analytics is finding the precise data you need to mine. Most people in “data” jobs — data analysts, data scientists, researchers, marketers — spend most of their time looking for the data that will fit a project’s requirements. One of our customers told us how their researchers from one location used to call those in another to find the data they need for experiments. This doesn’t scale.

Data indexing is a powerful way to categorize all your unstructured data across your enterprise and make it searchable by key metadata such as file size, file extension, date of file creation, date of last access, and custom (user-created) metadata such as project name or keyword (such as an experiment name or instrument ID). Creating a global data index gives central IT and departmental IT teams and data researchers the equivalent of Google Search across your enterprise. This way, you don’t have to physically move your data; silos aren’t the issue as long as you can look across them from your data center to the cloud to find and use what you need.

Make new uses of data while still being cost-efficient

Now that your data is indexed, users can find precisely the data sets they need and create policies to automate the movement of data in a query to the location of choice—such as a cloud data lake for AI analysis. This requires automation and a simple way to connect the dots so you can deliver the right data to the right place (and to the right people or applications) for action. Imagine creating custom workflows that enrich and optimize your data. For example: what if you could tag and automatically tier instrument data to low-cost cloud storage as it is created? Cloud AI and ML tools can then ingest the data for analysis. Once the analysis is complete, an unstructured data management solution can automatically move the data to a colder, cheaper tier. Meanwhile all of this happens automatically and at significantly lower costs to IT.

Collaborate with departments on data needs

Another critical piece to the puzzle is giving users and departments more insight into their data assets so they can work with IT on creating the best data management policies that support analytics initiatives. If departmental end users can interactively monitor data usage metrics and data trends, tag and search data and identify datasets for analytics, tiering and deletion—without IT intervention—it can lead to a more efficient, business-aligned and agile data management practice. Not only does this bridge the gap between IT and departments on data management decisions but both parties benefit: IT meets savings and governance goals while departments regain control over the data they need to protect and mine for future value.

Ultimately, modernization fuels data monetization by curating data for it to be easily and cost-effectively used by analytics applications on-premises or in the cloud. There are many requirements here, and solutions and tactics will surely evolve in the coming months with better, smarter automation as IT organizations zone in on unstructured data management and AI and ML initiatives.

The first half of 2023 required budgetary caution and smart spending in a roller-coaster economic environment. IT organizations need to institute further cost controls to stem wasteful spending and they need to think more about sustainability in all their practices to cope with a global energy and supply chain crisis. They need to do all of this while keeping their eyes on the prize: getting their data and data infrastructure ready for the AI age, which is just around the corner.