Shannoah G.

asked • 02/14/20

Data analyst sample question

The table has 4 fields: date, event_name, video_id and user_id. Write a query that will return the number of users who viewed at least one video in a given day and the number of those users who returned the next day to view at least one video. The video_play event signifies that a video was played by a user. Imagine the actual data set is much larger than shown in the sample data.




Sample Data:


date event_name video_id user_id
2018-01-01 video_play 51651561651 989189198
2018-01-01 video_play 98121651656 561884864
2018-01-01 video_play 78918918918 561884864
2018-01-01 create_video 32156541355 153215651
2018-01-01 create_video 87351531311 232135135
2018-01-02 create_video 51651561651 489846581
2018-01-02 video_play 12315315352 561884864
2018-01-02 create_video 32156541355 321351351
2018-01-02 create_video 87351531311 231513515
2018-01-02 video_play 98191891894 615616516




Desired output:

The output should have one row per day. Each row should have 3 columns, date, users_view_videos and users_view_next_day.


-      users_view_videos is the number of users who viewed at least one video in the given date

-      users_view_next_day is the subset number of users from user_view_videos who also view a video on the next day


From the sample data provided above this should be the output:


date users_view_videos users_view_next_day
2018-01-01 2 1
2018-01-02 2 0




Arvind P.

Here is a hint use having clause which ranges to the input date +1 (after grouping by the event name video)
Report

03/22/20

1 Expert Answer

By:

Jonathan K. answered • 03/25/20

Tutor
New to Wyzant

Harvard grad, philosophy MA, Wall St bank IT guy -- loves to teach!

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.