The need of Data Governance has been established at it has become one of the key initiative’s organizations are focusing on when it comes to managing the data. This blog talks about the differences in the Data Governance in Digital era when it compares to traditional Data Governance practices.
Three popular Data processing architecure for big and small data on cloud. These cover various scenarios for both batch, realtime, small and big data. The links take to the dedicated blog for each architecture
Delta architecture processes any new streaming records like delta (incremental) records and data lake is no longer immutable data structure
When we see entities in real world, we notice that there is a complex relationship occurs between the entities. Every entity type is unique and has multiple possible relationships. Graph databases solve this problem by providing ways to model the relationships in the database and that makes the insights very simple and easy
The key steps organizations can take to cross that hurdle/chasm and move ahead of the roadblock and prepare the foundation which will enable them to move along the curve
There are some gaps in data management and maintenance space in Azure. Following are the two things that I feel are missing from the current landscape of Azure and will hopefully be addressed soon
Imagine a scenario where we can maintain an immutable persistent stream of data and instead of processing the data twice, we can use the stream to replay the data for a different time using the code. That is the premise of Kappa architecture
The key reasons for the need of good data lake structure are: 1) Security: need of role-based security on the lake for read access. 2) Extendibility: it should be easy to extend the lake after first round and more systems can be added 3) Usability: it should be easy to use and find the data in the lake and the users should not get lost 4) Governance: it should be simple to apply governance practices to the lake in terms of quality, metadata management and ILM
From technology point of view Databricks is becoming the new normal in data processing technologies, in both Azure and AWS. This post provides a view of lambda architecture and uses Databricks at front and center. Databricks has capabilities to replace multiple tools and those are described in bit detail below
Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. This approach of architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.