Analysis 1

Preparation:

Data Literacy Fundamentals - Kapitel 5

Aggregation

MySQL Max

MySQL Average

MySQL Count

Exercise 1

TEAM 1: Alisa Deniz Nicklas

TEAM 2: Chris Gustav FS Lasse

TEAM 3: Maheen William Mads

TEAM 4: Nadine Arne Azra

TEAM 5: Abdul Frederik N Hans

TEAM 6: Frederik K Klara Ammar Gustav C

TEAM 7: David Jeppe Amalie

Exercises for today & next class

Notice that the 'gender' datatype is an ENUM
- Research: What is an enum? How does it work?
Select the order_table id and the customers phone number for all orders of product id 3
Select product names and order time for the espresso coffees sold between January 15th 2021 and February 14th 2021
Select the product name and price and order time for all orders from females in February 2021
- THIS WAS CHANGED FROM JANUARY
From the products table, select the name for all products that have a price greater than 30 or a coffee origin of Sri Lanka
How many male customers don't have a phone number entered into the customers table

Halfway there ! Take a break

From the products table, select the name and price of all produts with a coffee origin equal to Colombia or Sverige. Ordered by name from A-Z
From the orders table, select all the orders from February 2021 for customers with id's 2, 4, 6 or 8
From the customers table, select the first name and phone number of all customers who's last name containing the pattern "ar"
From the customers table, select the distinct last names and order alphabetically from A-Z
From the orders table, select the first 3 orders placed by customer with id 1 in February 2021
From the products table, select the name, price and cofee origin but rename the price to retail_price in the results set

Exercise 2: Study Groups

Consider the following dataset: https://www.kaggle.com/datasets/paradisejoy/top-hits-spotify-from-20002019

Answer the following questions:

How representative of a sample is the dataset of the full population of songs on spotify?
- Why? How did you conclude that?
If this dataset should inform the analyst about songs & music in general what is the bias of the dataset?

Our startup conducted a customer satisfaction survey.

We have sold 872.329 selfie sticks that is our only product.

Each selfie stick costs 10€ which is a total revenue of 8.723.290€.

We have counted 567.098 unique customer adresses.

And counted 688.909 unique registrated e-mail adresses.

We selected 250.000 random customers to receive a customer satisfaction questionaire.

34.958 Answers was returned.

What number represents the population?
What number represents the sample?
How representitive was the sample?
Did we reach census?

What analytic approach (descriptive, inferential, diagnostic) should be used if we want to understand:

Is the dataset representative and does it include large amount of biases?
What is the revenue pr. sold product?
How many selfie sticks were sold to people living in Denmark?

Exercise 3: Individual - Write the query

Q: what is the average speed of all pokemon?
- A: 68.9338
Q: How many pokemon are in the dataset?
- A: 151
Q: What is the maximum primary type?
- A: Water
Q: What is the minimum primary type?
- A: Bug
Q: Why? Make an informed guess

Exercise 4: Individual - Write the query

Q: What is the max speed of all pokemon?
- A: 140
Q: What is the average speed of all pokemon with 'Ground' as primary type?
- A: 58.1250
Q: How many pokemon are of primary or secondary type 'Ground'
- A: 14

Advanced (Optional):

Q: How many pokemon have a higher attack than the average?
- A: 72
Q: How many pokemon have a higher defence than the average ground pokemon?
- A: 20

Hint: https://www.mysqltutorial.org/mysql-subquery/

PreviousJoining Tables NextAnalysis 2 & Spotify kick-off

Last updated 3 years ago

hashtagExercise 1

hashtagExercise 2: Study Groups

hashtagExercise 3: Individual - Write the query

hashtagExercise 4: Individual - Write the query

Exercise 1

Exercise 2: Study Groups

Exercise 3: Individual - Write the query

Exercise 4: Individual - Write the query