A common scenario is playing out in companies around the world: the pressure to “do something” with data to follow the latest hype. As pressure builds up, leaders might feel pushed to purchase the latest, shiniest tools or to implement overly complex solutions that drain resources while bringing no value.
With Levi9’s help, a sustainable housing solutions company avoided the hype trap, focusing instead on bringing sustainability into their data management first. Rather than stacking data indiscriminately and chasing trending platforms, Levi9 concentrated on starting small and building up towards a data strategy that aligns with business goals and allows flexibility for growth.
Hoarding data is not enough
Our client approached us with a request we often hear: they wished to gain more insights from their data. The company, which creates custom analysis and recommendations for more eco-friendly homes, manages a wide range of data, such as events, location, telemetry, product quotes, web app user input, energy supply and demand, subsidy information, etc. When their new CTO tried to map the data landscape, he discovered that his teams were accessing separate datasets from various sources.
“They had several teams working on data products and advanced analytics,” remembers Iulian Prodan, Data Tech Lead at Levi9. “Teams worked in isolated data environments without a unified strategy or governance.”
This type of data fragmentation is a common challenge in many companies: multiple teams using data in their own silos, each accessing similar information in different ways and analyzing it separately, without standardization or proper governance. Dealing with the increased complexity of managing, steering, or trusting the accuracy and reliability of such solutions, some companies might find themselves thrust in a whirlpool of marketing promises and neverending checklists.
Start small, scale later
Levi9 chose a more strategic approach, beginning by understanding the client’s business context and data landscape in depth. “We discovered their main goals and challenges by having in-depth conversations about their technical capabilities, data usage, and business model,” recalls Eliza Enache, Levi9 Data Engineer. “They needed a scalable data platform that would enable collaboration across teams, support self-service analytics, and help improve their statistical analysis and ML abilities.”
The team chose to “start small and scale” instead of trying to develop everything at once. They began with a high-value use case of analysing the customer’s journey. In a process that could take hours, data was exported and processed manually when needed, having passed through multiple systems and stored in different formats. “We identified the most valuable data and the methods used to process and analyze it, and proposed centralizing access to make it directly available,” says Iulian.
In collaboration with the customer’s team, Levi9 proposed implementing a Data Lakehouse using Databricks and Azure storage as a “single source of truth.” Apart from its technical merits, the solution directly addressed the client’s requirements for an accessible and collaborative platform that would enable self-service analytics, underlines Eliza. “The Spark architecture’s fast processing capabilities, combined with Databricks’ features, provided a scalable base for an inclusive data platform that would satisfy both current and emerging business needs, as well as allow exploration and play.
The impact was quickly visible. The solution automatically refreshed data during off-hours so that each morning it could be up-to-date. Data teams could now access information directly in fewer steps, which enabled them to quickly create draft reports and conduct preliminary analyses. Levi9 now had the trust of the client and could move forward with a higher-level approach.
The team implemented a medallion architecture, where the structure and quality of data is progressively improved through bronze, silver, and gold layers. Levi9 first brought the data to a bronze level and then built on top of it, reaching silver levels with some data types.
GDPR-compliant while extracting insights
The next phase introduced Unity Catalog, a data governance solution that enables unified governance across workspaces, roles, and teams. In addition to implementing security policies down to object level, Unity Catalog can serve teams working with complex data from multiple sources by allowing them to easily track its lineage and have a clearer overview of its content and properties, as well as to document and classify it. One of the successfully completed challenges was “exploding” the data – transforming complex nested JSON data into multiple, normalized database tables, which made data more accessible.
Building on this data management method, Levi9 helped the company become GDPR compliant and secure personal information while improving their capacity for advanced analytics and machine learning by helping them use all relevant data they’ve gathered so far. One of the key technical challenges involved encrypting custom fields within deeply nested JSON source files at the ingestion phase. The custom-built encryption system was able to handle personally identifiable information (PII) within these complex data structures.
“The encryption mechanism masks sensitive information while keeping the rest of the information accessible,” explains Iulian. “Using this approach, we can manage data availability programmatically, according to security policies and compliance requirements.” When the retention period expires, instead of purging the data, we simply delete the encryption keys. Personal information is forgotten, but anonymous patterns are preserved.
How the strategic investment paid off
The conjunction of gradual implementation, flexible architecture, and governance resulted in a new landscape: manual processes were replaced with streamlined data flows, providing accuracy and allowing the development team to focus on higher-value tasks; data was processed, cleansed, and refreshed daily, providing a single source of truth that supports timely analytics and reporting; the GDPR solution, applied on a highly complex format, allowed the client to keep more of their data and use it to improve forecasting and machine learning models.
“We also wanted to make sure that what we delivered was future-proof,” points out Eliza. Because Levi9 prioritized a code-first approach to data management, the modules they implemented could be modified and re-used by the client’s data analytics team as needed. As new tools continue to emerge, the foundation laid out by the Levi9 team supports growth and innovation. Their teams can now model new data layers, integrate additional data sources, and explore advanced machine learning applications.
“While tools enable you to hoard data, they will not necessarily help you. You might be directing your efforts disproportionately. When you have a strategy, you can distribute efforts and priorities, choose the right team, and assign team members appropriately.”
– Eliza Enache, Data Engineer, Levi9
Strategy provides sustainability
Building a sustainable data ecosystem isn’t just about choosing the right tools or jumping on the latest trend; it’s about developing a long-term strategy that enables growth alongside your business.
Tools and technologies should support, not dictate, your strategy. A well-defined strategy allows businesses to adapt to unpredictable changes, make informed investment decisions, and drive sustainable growth. This approach will ultimately inform the selection of tools, methods, and technologies by enabling a common understanding across the organization about its values and goals.