A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.
You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions. While a traditional data warehouse stores data in hierarchical dimensions and tables, a data lake uses a flat architecture to store data, primarily in files or object storage. That gives users more flexibility on data management, storage and usage.
A data lake is also defined by what it isn’t. It’s not just storage, and it’s not the same as a data warehouse. While data lakes and data warehouses all store data in some capacity, each is optimized for different uses. Consider them complementary rather than competing tools, and companies might need both. As a point of comparison, data warehouses are often ideal for the kind of repeatable reporting and analysis that’s common in business practices, such as monthly sales reports, tracking of sales per region, or website traffic.