Analysis 1
Preparation:
Data Literacy Fundamentals - Kapitel 5
Exercise 1
TEAM 1: Alisa Deniz Nicklas
TEAM 2: Chris Gustav FS Lasse
TEAM 3: Maheen William Mads
TEAM 4: Nadine Arne Azra
TEAM 5: Abdul Frederik N Hans
TEAM 6: Frederik K Klara Ammar Gustav C
TEAM 7: David Jeppe Amalie
Exercises for today & next class
A)
Notice that the 'gender' datatype is an ENUM
Research: What is an enum? How does it work?
Select the order_table id and the customers phone number for all orders of product id 3
Select product names and order time for the espresso coffees sold between January 15th 2021 and February 14th 2021
Select the product name and price and order time for all orders from females in February 2021
THIS WAS CHANGED FROM JANUARY
From the products table, select the name for all products that have a price greater than 30 or a coffee origin of Sri Lanka
How many male customers don't have a phone number entered into the customers table
Halfway there ! Take a break
B)
From the products table, select the name and price of all produts with a coffee origin equal to Colombia or Sverige. Ordered by name from A-Z
From the orders table, select all the orders from February 2021 for customers with id's 2, 4, 6 or 8
From the customers table, select the first name and phone number of all customers who's last name containing the pattern "ar"
From the customers table, select the distinct last names and order alphabetically from A-Z
From the orders table, select the first 3 orders placed by customer with id 1 in February 2021
From the products table, select the name, price and cofee origin but rename the price to retail_price in the results set
Exercise 2: Study Groups
A)
Consider the following dataset: https://www.kaggle.com/datasets/paradisejoy/top-hits-spotify-from-20002019
Answer the following questions:
How representative of a sample is the dataset of the full population of songs on spotify?
Why? How did you conclude that?
If this dataset should inform the analyst about songs & music in general what is the bias of the dataset?
B)
Our startup conducted a customer satisfaction survey.
We have sold 872.329 selfie sticks that is our only product.
Each selfie stick costs 10€ which is a total revenue of 8.723.290€.
We have counted 567.098 unique customer adresses.
And counted 688.909 unique registrated e-mail adresses.
We selected 250.000 random customers to receive a customer satisfaction questionaire.
34.958 Answers was returned.
What number represents the population?
What number represents the sample?
How representitive was the sample?
Did we reach census?
What analytic approach (descriptive, inferential, diagnostic) should be used if we want to understand:
Is the dataset representative and does it include large amount of biases?
What is the revenue pr. sold product?
How many selfie sticks were sold to people living in Denmark?
Exercise 3: Individual - Write the query
Q: what is the average speed of all pokemon?
A: 68.9338
Q: How many pokemon are in the dataset?
A: 151
Q: What is the maximum primary type?
A: Water
Q: What is the minimum primary type?
A: Bug
Q: Why? Make an informed guess
Exercise 4: Individual - Write the query
Q: What is the max speed of all pokemon?
A: 140
Q: What is the average speed of all pokemon with 'Ground' as primary type?
A: 58.1250
Q: How many pokemon are of primary or secondary type 'Ground'
A: 14
Advanced (Optional):
Q: How many pokemon have a higher attack than the average?
A: 72
Q: How many pokemon have a higher defence than the average ground pokemon?
A: 20
Hint: https://www.mysqltutorial.org/mysql-subquery/
Last updated