The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by yalulaiwi18, 2022-09-07 04:59:50

TM351 Final 05-01-2019 Secured

TM351 Final 05-01-2019 Secured

Eng. Ahmed Samy TM351 Final 05-01-2019

Faculty Of Computer Studies Ahmed Samy

TM351
Data management and analysis

Final Examination Q&A

Spring – 2018-2019 () Hours [email protected]
Date 05/01/2019

Number of Exam Pages: (6 ) Time Allowed:

Question Type Max Mark 00965 99941566
Part I: Multiple Choice Questions (MCQ) 20
Part II: Long Questions 80
Total grade 100

Instructions:

1. Answer all questions.
2. Mobiles and calculators are not allowed.

Note: no calculations in this exam require the use of a calculator.
Only simple calculations may be required.

1

Eng. Ahmed Samy TM351 Final 05-01-2019

Part I:[ 20 marks]

# This part contains MCQ questions covering parts 1 – 10. The sample questions shown here are

only samples, and the actual questions could range over all the covered content.

Please select the most appropriate answer and write it on the answer booklet

1. In SQL, a scalar subquery always returns --------------------------------.

a. Single row and single b. Multiple columns and c. Single column and d. none of the above

column a single row multiple rows

2In the -----------approach to recommender systems, recommendations are made based on what people with

similar tastes and preferences have liked in the past.

a. Content-based b. Collaborative filtering c. non-collaborative d. none of the above

recommendation filtering

3. The ---------------approach for increasing the capacity of relational DBMS implies using many small Ahmed Samy

servers and distrusting the data among them with redundancy.

a. Scaling up b. Scaling out c. centralization d. none of the above

4. The ----------------tells us whether the difference between: 1.the observed numbers of items in each of

several categories or combinations of categories and 2. How many items there should be is significant.

a. spearman’s b. X2 test c. Pearson’s R2 d. Ratio test

5. -----------is a technique that increases the capacity and performance of a database by splitting [email protected]

the dataset over several servers, with each server then responsible for storing, updating, and

returning the appropriate data on request.

a. Resilliency b. Replication c. Sharding d. Availability

6 The CAP theorem states that distributed systems can at best satisfy only two of the three

properties (consistency, availability or, -----------------).

a. Partition tolerance b. Productivity c. Performance d. Payoff

7. In a data warehousing ---------------------- measures can be combined along any dimension.

a. Additive b. Semi-additive c. Non-additive d. None of the above

8 ----------------------- is concerned with segmenting a diverse group of data into a number of similar sub- 00965 99941566

groups.

a. Clustering b. Correlation c. Combination d. Regression

9. In the K-NN algorithm misclassification when there are many more training data instances from one

class than others, one way to compensate for this is to use----------------- as a weight for each of the K-

nearest data instances.

a. The distance to the b. The inverse of the c. The number of points d. The average distance of the

centroid distance to the centroid near the centroid centroid

10 In the cosine similarity measure of document similarity, if the angle between the vectors representing

documents A and B is smaller than the angle between documents A and C, then document A is considered

to be -------------------------- document C.

a. More similar to b. Less similar to c. As similar as d. none of the above

2

Eng. Ahmed Samy TM351 Final 05-01-2019

Part Two: [80 marks]

Question One: [30 marks]

Use the hospital database below to answer the following questions:

Patient (Patient_id, Patient_name, gender, height, age, weight, staff_no, ward_no) Ahmed Samy
Ward (Ward_no , hospital_no, Ward_name, number_of_beds)
Treatment (Staff_no , Patient_no, start_date ,start_day_of_week, reason)
Doctor (Staff_no , Doctor_name , position, specialization, hospital_no)
Hospital(hospital_no, hospital_name, number_of_operation_rooms, city,
city_population, telephone_no)
a. Write SQL statement to list the names of all doctors whose speciallisation is radiology and who

work in the hospital with hospital_no = ‘h102’.
Answer:
Select Doctor_name
From Doctor
Where specialization = ‘radiology’ AND hospital_no = ‘h102’

b. Write SQL statement to list the names and position of all doctors who work in a hospital in [email protected]
London.

Answer:
Select Doctor_name , position
From Doctor , hospital
Where hospital. hospital_no = Doctor. hospital_no AND hospital.city= ‘London’

Another solution:
Select Patient_name , reason
From Hospital INNER JOIN Doctor ON hospital. hospital_no = Doctor. hospital_no
Where hospital.city= ‘London’

c. Write SQL statement to list the city and the total number of operation rooms in the hospitals of that 00965 99941566
city.

Answer:
Select City , SUM(number_of_operation_rooms)
From Hospital
Group By City

d. Write SQL statement to find the total number of beds in Queen Charlotte’s and Chelsea
hospital in London.

Answer:
Select Hospital_name, SUM(Number_Of_Beds)
From Hospital, Ward
Where Hospital.Hospital_no=Ward.Hospital_no AND Hospital.City = ‘London’ AND
Hospital.Hospital_name IN (‘Queen Charlotte’s’,’ Chelsea’)
Group By Hospital_name

3

Eng. Ahmed Samy TM351 Final 05-01-2019

Another solution:
Select Hospital_name, SUM(Number_Of_Beds)
From Hospital Inner Join Ward ON Hospital.Hospital_no=Ward.Hospital_no
Where Hospital.City = ‘London’ AND Hospital.Hospital_name IN (‘Queen Charlotte’s’,’ Chelsea’)
Group By Hospital_name

e. Write SQL statement to update the population of London by adding 10,000 to it.
Answer:
Update Hospital
Set City_Population= City_Population +10,000
Where City = ‘London’

f. To which normal form would the hospital relation confirm if we know that City determines city_ Ahmed Samy
population? Explain why you reached this conclusion and normalize the treatment relation to the
next higher normal form. [email protected]

Answer:
The hospital table is in 2nd normal form because every non-primary key attribute is fully functionally
dependent on the primary key not part of it. And violated the 3rd normal form since (city_ population
depend on City) both non-key columns.

Normalization:

Hospital (hospital_no, hospital_name, number_of_operation_rooms, city, telephone_no)
City (City ,City_Population)

00965 99941566

4

Ahmed Samy [email protected] 00965 99941566

5

TM351 Final 05-01-2019

Eng. Ahmed Samy

Eng. Ahmed Samy TM351 Final 05-01-2019

a. Name the Key(s) that has a list of other documents as its value. Ahmed Samy
Answer
Grades{Date, Grade, Score} [email protected]

b. Name the key(s) that has a subdocument as its value. 00965 99941566
Answer
Address {Building, Coord, Street, ZipCode}

c. Write a MongoDB query to find all documents in the restaurant collection.
Answer
db.Restaurants.find()

d. Write a MongoDB query to find the names of cities that have at least one restaurant in the
collection

Answer
db.resturant.find({ },{ city: 1}

e. Write a MongoDB query to find the names of all restaurant in Paris
Answer
db.restaurants.find( { "City": "Paris" }, { name: 1})

f. Write a MongoDB query to find any one 5-star restaurant in Paris
Answer
db.restaurants.find_one( { "City": "Paris", "stars": 5 }, { name: 1})

g. Write a MongoDB query to find out how many restaurants are there in the collection.
Answer
db.restaurants.find( ).count( )

6

Eng. Ahmed Samy TM351 Final 05-01-2019

Question 3

Below is a depiction of a data cube for student graduation figures at AOU as a
table. Examine this data cube and answer the questions afterwards:

Semester Programme Branch

Fall Business Kuwait KSA Egypt Jordan Ahmed Samy
Fall IT 400 500 350 300
Fall 300 400 250 200
Spring English 200 300 150 100
Spring Business 350 450 300 250
Spring 250 350 200 150
Summer IT 150 250 100 50
Summer English 100 200 150 50
Summer Business 75 150 100 30
25 35 20 15
IT
English

a) Slice the cube by branch and semester, showing only the data for the IT programme and show the

resulting cube as a table.

Semester Branch [email protected]

Kuwait KSA Egypt Jordan

Fall 300 400 250 200

Spring 250 350 200 150

Summer 75 150 100 30

b) Dice the cube by programme, showing only the data for the summer in KSA.

Programme Branch
KSA
Business 200
IT 150
35
English

c) Roll up the data cube to show only the numbers of graduated students per branch. 00965 99941566

Branch

Kuwait KSA Egypt Jordan
1850 1145
2635 1620

7

Eng. Ahmed Samy TM351 Final 05-01-2019

d) Drill down this data cube by adding a learning center dimension. Only show the structure of the
resulting table with all headings (i.e. show an empty table without any numbers). Assume that KSA has
two centers and Jordan has two centers, all others have only 1 center.

Semester Programme Kuwait learning KSA Branch learning Jordan learning
center learning Egypt center center
Fall Business 400 500 center 300
Fall IT 300 400 200 Ahmed Samy
Fall 200 300 350 100
Spring English 350 450 250 250
Spring Business 250 350 150 150
Spring 150 250 300 50
Summer IT 100 200 200 50
Summer English 75 150 100 30
Summer Business 25 35 150 15
100
IT 20
English

[email protected]

00965 99941566

8

Eng. Ahmed Samy TM351 Final 05-01-2019

Question 4

a) Calculate the centroid for the following 4 data points.
P1 = [2 , 3, 4]
P2 = [5, 6, 7]
P3 = [8, 9, 10]
P4 = [11, 12, 13]

Centroid X = + + + = .


Centroid Y = + + + = .


Centroid Z = + + + = . Ahmed Samy


b) In a silhouette analysis, the average distance of point Pi – which is in cluster A to each of 3 existing [email protected]
clusters (A, B, C) were found to be:

Cluster Average distance from Pi, to each cluster member
A5
B 25
C 10
Calculate the silhouette coefficient of point Pi. Explain your steps.

• As Pi has been placed in Cluster A, then a(i) is the average distance from i to each of the other
points in Cluster A, in this case, a(i) = 5.

•b(i) is the average distance from i to each member of the nearest cluster to Pi.
The average distance to Cluster B is 25, and the average distance to Cluster C is 10,
Since the nearest cluster to Pi is cluster C, then b(i) is 10

Since a(i) < b(i)

S(i) = 1 - −= .


00965 99941566

9


Click to View FlipBook Version