Analysis 1

Preparation:

Data Literacy Fundamentals - Kapitel 5

Aggregationarrow-up-right

MySQL Maxarrow-up-right

MySQL Averagearrow-up-right

MySQL Countarrow-up-right

Exercise 1

TEAM 1: Alisa Deniz Nicklas

TEAM 2: Chris Gustav FS Lasse

TEAM 3: Maheen William Mads

TEAM 4: Nadine Arne Azra

TEAM 5: Abdul Frederik N Hans

TEAM 6: Frederik K Klara Ammar Gustav C

TEAM 7: David Jeppe Amalie

Exercises for today & next class

A)

  • Notice that the 'gender' datatype is an ENUM

    • Research: What is an enum? How does it work?

  • Select the order_table id and the customers phone number for all orders of product id 3

  • Select product names and order time for the espresso coffees sold between January 15th 2021 and February 14th 2021

  • Select the product name and price and order time for all orders from females in February 2021

    • THIS WAS CHANGED FROM JANUARY

  • From the products table, select the name for all products that have a price greater than 30 or a coffee origin of Sri Lanka

  • How many male customers don't have a phone number entered into the customers table

Halfway there ! Take a break

B)

  • From the products table, select the name and price of all produts with a coffee origin equal to Colombia or Sverige. Ordered by name from A-Z

  • From the orders table, select all the orders from February 2021 for customers with id's 2, 4, 6 or 8

  • From the customers table, select the first name and phone number of all customers who's last name containing the pattern "ar"

  • From the customers table, select the distinct last names and order alphabetically from A-Z

  • From the orders table, select the first 3 orders placed by customer with id 1 in February 2021

  • From the products table, select the name, price and cofee origin but rename the price to retail_price in the results set

Exercise 2: Study Groups

A)

Consider the following dataset: https://www.kaggle.com/datasets/paradisejoy/top-hits-spotify-from-20002019

Answer the following questions:

  • How representative of a sample is the dataset of the full population of songs on spotify?

    • Why? How did you conclude that?

  • If this dataset should inform the analyst about songs & music in general what is the bias of the dataset?

B)

Our startup conducted a customer satisfaction survey.

We have sold 872.329 selfie sticks that is our only product.

Each selfie stick costs 10€ which is a total revenue of 8.723.290€.

We have counted 567.098 unique customer adresses.

And counted 688.909 unique registrated e-mail adresses.

We selected 250.000 random customers to receive a customer satisfaction questionaire.

34.958 Answers was returned.

  • What number represents the population?

  • What number represents the sample?

  • How representitive was the sample?

  • Did we reach census?

What analytic approach (descriptive, inferential, diagnostic) should be used if we want to understand:

  • Is the dataset representative and does it include large amount of biases?

  • What is the revenue pr. sold product?

  • How many selfie sticks were sold to people living in Denmark?

Exercise 3: Individual - Write the query

  • Q: what is the average speed of all pokemon?

    • A: 68.9338

  • Q: How many pokemon are in the dataset?

    • A: 151

  • Q: What is the maximum primary type?

    • A: Water

  • Q: What is the minimum primary type?

    • A: Bug

  • Q: Why? Make an informed guess

Exercise 4: Individual - Write the query

  • Q: What is the max speed of all pokemon?

    • A: 140

  • Q: What is the average speed of all pokemon with 'Ground' as primary type?

    • A: 58.1250

  • Q: How many pokemon are of primary or secondary type 'Ground'

    • A: 14

Advanced (Optional):

  • Q: How many pokemon have a higher attack than the average?

    • A: 72

  • Q: How many pokemon have a higher defence than the average ground pokemon?

    • A: 20

Hint: https://www.mysqltutorial.org/mysql-subquery/

Last updated