01 Dataliteracy
Dataset
Kaggle link: Top Hits Spotify from 2000-2019
SQL Link: Dataset as SQL script
Learning objectives
The purpose of this project is to
Become more familiar with the logic of how raw data is structured into information (database content)
Get hands on experience on how to get specific knowledge out of the information
Learn from this experience to create your own research question and get more experience with possibilities and limitations (wisdom)
Delivery
Individual or in pairs
Max 5 pages in .pdf using the hand-in link
Report content
Brief introduction & description of the dataset
How is the data structured? (E.g. Highligts of: Data scales, Table content, Data types)
For each question below:
Write a query solving the question
In your own words - how does the query work?
A single example application/usage of the data set within:
Your personal domain
A professional domain (You decide whom the "user" is)
Research topic
Formulate a question and answer that question with a query (You could be inspired by aforementioned example application/usage of the data set)
Questions
How many songs have explicit content?
What is the most danceable track from 2001?
How many ms is the longest running track from each year?
What top 3 artist has the most songs on the list and how many songs do each have?
What year had the most popular songs on average?
Which 10 artists has the longest songs on average
With at least 5 songs in the top 2000
and how many songs are counted in that average?
Advanced (Optional)
Which tracks has above average danceability, without explicit content and are more popular than the average song?
How many artists has more than 5 top hits and uses less than 4 different keys?
What genre has the lowest count of tracks above average in danceability?
To do this task - you need to redesign the database
Handin Date
25/9 - 2022 23:59
Handlin Link
Last updated