What Is the Data Lakehouse Pattern?
Whereas Data Warehouses typically structure and organise their data into a set relational schema as it is ingested, the Data Lake pattern involves storing data which is unstructured or semi structured and leaving decisions about how the data will be processed and analysed until later when it is consumed. Data Lakes are also a better fit for data such as images, log files or other binary files and not just relational data. In other instances, the two are integrated so that data is exchanged between the two, for instance by surfacing Lake data to Data Analysts through a relational Warehouse. This involves taking the best of the Data Warehouse and Data Lake and delivering them together with one technology solution and one copy of the data. This shared data source can be used by Data Analysts and Data Scientists who can adopt common tooling and patterns;. In practice, the Data Lakehouse pattern can be thought of as providing a layer over the top of the unstructured Lake, which makes it look like more a Data Warehouse. Data Lakehouse can sound like a marketing buzzword, but there is real substance behind it and I believe it is likely to be one of the big transformation stories for data strategy in the enterprise going forward.