Ayobonike O. answered 9d
Lead Data Scientist | MBA | Expert in Predictive Modeling & Analytics
A Data Lake using services like Amazon S3, Azure Data Lake Storage (ADLS) or Google Cloud Storage) is a centralized repository that allows you to store all your structured and unstructured data at any scale.
The DOT feeds consist of videos and images. Traditional Relational Databases (SQL) are Schema meaning I must define the data structure (tables, columns, data types) before I can save anything.
I can grab the raw video files and varied metadata (GPS, timestamps, JSON files) and store them in their native format immediately.
Since the boss wants to start saving data immediately but the R&D team won't use it for a year, I will be accumulating massive amounts of high-resolution video.
Cloud object storage (the foundation of a Data Lake) is significantly cheaper per gigabyte than the high-performance storage required for an active SQL Database. I can store petabytes of cold data for a fraction of the cost.
Machine Learning (ML) models—specifically Computer Vision models like YOLO (You Only Look Once)—need raw, unaltered data for training.
By saving the raw footage now, you preserve the maximum amount of information. If you tried to "process" it into a structured database now, you might accidentally strip out details (like background lighting or specific angles) that the future ML algorithm might actually need to distinguish between car models.