Please answer the short prompt regarding database storage solutions and structure. Which data storage solution is best?

Question

Situation:You work at an organization that operates a very large chain of used car lots. Your boss pulls you into a meeting where there is a lot of excitement about a future initiative around camera data of vehicles. In short, someone in the R&D department has this idea that if they get their hands on data from public cameras pointed at interstates they can use machine learning algorithms to quantify the make and models of vehicles of various regions. The gist is that, in the future, if they're able to use this technique they could churn out reports of all the vehicles that passed various camera points, and gain market insights into what vehicles are popular in that region of the country. Knowing which vehicles are being driven in certain areas could give them a competitive advantage. After asking a lot of questions, you learn that all that what is available right now is several different camera feeds provided by the Department of Transportation. Your company is interested in grabbing and storing the videos and images from those cameras so that they can analyze them in the future. The feeds have various amounts of meta-data, such as timestamps, GPS locations, etc., that vary depending on the source of the feed. The R&D department anticipates that they are at least a year out from developing the algorithms they need to use this data, and cannot give you specifications about what their structure needs are or will be. However, you get the sense that your boss would very much like to start storing the video and image data immediately in order not to miss any opportunities. What type of data storage solution would you recommend and why?For simplicity's sake, please limit suggestions to the concepts of: ﻿Unstructured (or lightly structured) Data Lake Storage, Key-Value Store, Column Store, Document Store, Graph Database, Relational Database designed for OLTP, or Relational Database designed to serve as a Data Warehouse.

Ayobonike O. · Accepted Answer

A Data Lake using services like Amazon S3, Azure Data Lake Storage (ADLS) or Google Cloud Storage) is a centralized repository that allows you to store all your structured and unstructured data at any scale.

The DOT feeds consist of videos and images. Traditional Relational Databases (SQL) are Schema meaning I must define the data structure (tables, columns, data types) before I can save anything.

I can grab the raw video files and varied metadata (GPS, timestamps, JSON files) and store them in their native format immediately.

Since the boss wants to start saving data immediately but the R&D team won't use it for a year, I will be accumulating massive amounts of high-resolution video.

Cloud object storage (the foundation of a Data Lake) is significantly cheaper per gigabyte than the high-performance storage required for an active SQL Database. I can store petabytes of cold data for a fraction of the cost.

Machine Learning (ML) models—specifically Computer Vision models like YOLO (You Only Look Once)—need raw, unaltered data for training.

By saving the raw footage now, you preserve the maximum amount of information. If you tried to "process" it into a structured database now, you might accidentally strip out details (like background lighting or specific angles) that the future ML algorithm might actually need to distinguish between car models.

Please answer the short prompt regarding database storage solutions and structure. Which data storage solution is best?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

Write an assembly program that loads two numbers from memory, calculates their sum, and stores the sum to the memory.

why would this show this error? and an example on how to fix it? i keep getting five of them

verify and explain

Need help in a software engineering question

what problem would arise if two real time operating systems are made to be hosted atop a hypervisor resident atop a quad core processor based computing system

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

Please answer the short prompt regarding database storage solutions and structure. Which data storage solution is best?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

Write an assembly program that loads two numbers from memory, calculates their sum, and stores the sum to the memory.

why would this show this error? and an example on how to fix it? i keep getting five of them

verify and explain

Need help in a software engineering question

what problem would arise if two real time operating systems are made to be hosted atop a hypervisor resident atop a quad core processor based computing system

RECOMMENDED TUTORS

find an online tutor