What happens to data when organizations acquire others across the globe?
In addition to the data from the various digital assets such as websites, mobile apps etc, digital enterprises also generate data from their physical properties across geographies.
It is in the charter of every CIO/CDO that within the first year, they consolidate their assets and their investment buckets in order to generate a single view of the customer across the group.
The key driver to engineer such Big Data initiatives is to understand the customer engagement throughout the group, across the buying cycle i.e from lead to cash. This, in turn, would help to target the customers effectively and facilitate the achievement of business objectives - ensuring brand stickiness and increased top line, even at the time of consolidation.
This feat is definitely not trivial. The key activities include:
- Ingesting data across various source systems at various velocities
- Standardizing the data sets and data model based on dictionary & metadata
- Data Quality and Cleansing
- Mastering the dimensions
- Creating reports for insights
- Analytical Sandpits for data science & operationalization of the models
- Orchestrating and integrating with upstream systems
- Compliance & Security based on regulations
- Creating Backup of data and able to search despite low latency
- Managing all systems and data on cloud
While building such a system, some of the challenges that one can encounter are:
- Difficulty in building a data dictionary across markets
- Delay due to inability of the source systems to provide data in the prescribed format
- Data cleansing of every source feed is manual, time-consuming, inaccurate and highly dependent on data stewards
- Mastering of dimensions can be altogether a new project in itself
- Life cycle management for analytical sandpits and data science modeling involves all aspects of SDLC and needs to be automated
- Building reports which can lead to data proliferation
- Compliance and regulations are continuously evolving like GDPR
- Data management and life cycle becomes complicated when using cloud for backup
- Technology stacks are unable to leverage the benefits of chipsets, machine & cloud without compromising on speed and scalability
It’s certain that such initiatives will lead to higher campaign or conversion efficacy and improve the engagement score. But, within a year, it would lead to cost overruns, become people dependent and result in yet another bespoke Big Data implementation.
Thus it's imperative for organizations to consider a platform-centric approach rather than commissioning such one-off initiatives. The DaaS platform can be leveraged across the organization to (from lowest importance to greatest):
- Create Backup & Storage of Raw data sets
- Maintain Data Quality System
- Standardize and Ad-hoc Reporting
- Analyze Sandpits for Data Science Development
- Manage Outbound Consumption
The key tenets and building blocks of the platform include:
- Input Standardization: Define a format (Group Platform Format – GPF) and expose interfaces supporting queues, web services & batch. The components should support both single as well as large volume record ingestion at various velocities (batch and real-time). Create a web-based application for defining input formats which can then be used to define the ETLs.
- Dictionary First: It’s very important to define a V1 version of the dictionary at the start. This should be enforced at the metadata layer and tightly integrated with the previous step.
- Automated Data Cleansing Leveraging ML/AI: Lay the foundation of machine learning in data cleansing i.e. matching, auto–correct, imputation, outlier detection. All such models should be integrated with the ETL/ELT process and automated data governance process. It will help in defining data quality threshold during production runs and in the overall mastering of records.
- Model Development Life Cycle: Have a clear strategy in the model development process, including source configuration, model development and collaboration, bi- directional interfaces, model operationalization. This will define the tech stack, creation of sandpits with the relevant data sets, transient and persistent nature of data, DevOps and a web application to configure the entire process. The foundation is critical to ensure less human dependence, the ability for a data scientist to develop models with a clear and well-defined process & governance and avoid data proliferation.
- GDPR and compliance will change the way data will be stored, masked, governed and notified. Have 3-5 year horizon as it will have a huge impact on the data model design.
- Real-Time Insights Generation: While data can have various frequencies at the rate at which they are ingested; the data processing and insights should be real-time leveraging various architectural strategies existing including in memory, active/passive modes. ROLAP/HOLAP, Star schema/No SQL among others. Most of the reports need to be busted and should be available in the reporting portal for anytime preview, while power users should be able to generate richer insights but digging deep into the data. The overall framework requires a robust end to end governance ensuring logging & audit at all layers in order to decipher long and time-consuming queries, frequently accessed reports etc.
- Combinatorial Solutions: Leverage the technical stack and machines that can leverage native or underlying processor as well as a chipset capabilities to give maximum throughput at minimal cost.
The platform-centric approach certainly helps in:
- Initiative with clear outcomes
- Empowering various stakeholders based on the output they desire
- Ensuring equal responsibility between source and group organizations to make this successful
- Delineating a framework for various use cases and not just a point solution
- Lowering maintenance cost (people and software)
- Creating platform roadmap team with clear backlogs
DaaS in the organization is viewed under two lenses: for business and technology. Platform-centric approach will not only provide flexible and useful service layer across business applications but render a clear road-map for the stakeholders to provide desired output to their customers. HARMAN has been in the forefront in implementing platforms that serve the various needs of the enterprise. Drop us a note here for more information.
NOTE: Each of the above-mentioned points will be covered in detail every fortnight as a part of this series.