Eng. Ahmed Samy TM351 Final 05-01-2019
Faculty Of Computer Studies Ahmed Samy
TM351
Data management and analysis
Final Examination Q&A
Spring – 2018-2019 () Hours [email protected]
Date 05/01/2019
Number of Exam Pages: (6 ) Time Allowed:
Question Type Max Mark 00965 99941566
Part I: Multiple Choice Questions (MCQ) 20
Part II: Long Questions 80
Total grade 100
Instructions:
1. Answer all questions.
2. Mobiles and calculators are not allowed.
Note: no calculations in this exam require the use of a calculator.
Only simple calculations may be required.
1
Eng. Ahmed Samy TM351 Final 05-01-2019
Part I:[ 20 marks]
# This part contains MCQ questions covering parts 1 – 10. The sample questions shown here are
only samples, and the actual questions could range over all the covered content.
Please select the most appropriate answer and write it on the answer booklet
1. In SQL, a scalar subquery always returns --------------------------------.
a. Single row and single b. Multiple columns and c. Single column and d. none of the above
column a single row multiple rows
2In the -----------approach to recommender systems, recommendations are made based on what people with
similar tastes and preferences have liked in the past.
a. Content-based b. Collaborative filtering c. non-collaborative d. none of the above
recommendation filtering
3. The ---------------approach for increasing the capacity of relational DBMS implies using many small Ahmed Samy
servers and distrusting the data among them with redundancy.
a. Scaling up b. Scaling out c. centralization d. none of the above
4. The ----------------tells us whether the difference between: 1.the observed numbers of items in each of
several categories or combinations of categories and 2. How many items there should be is significant.
a. spearman’s b. X2 test c. Pearson’s R2 d. Ratio test
5. -----------is a technique that increases the capacity and performance of a database by splitting [email protected]
the dataset over several servers, with each server then responsible for storing, updating, and
returning the appropriate data on request.
a. Resilliency b. Replication c. Sharding d. Availability
6 The CAP theorem states that distributed systems can at best satisfy only two of the three
properties (consistency, availability or, -----------------).
a. Partition tolerance b. Productivity c. Performance d. Payoff
7. In a data warehousing ---------------------- measures can be combined along any dimension.
a. Additive b. Semi-additive c. Non-additive d. None of the above
8 ----------------------- is concerned with segmenting a diverse group of data into a number of similar sub- 00965 99941566
groups.
a. Clustering b. Correlation c. Combination d. Regression
9. In the K-NN algorithm misclassification when there are many more training data instances from one
class than others, one way to compensate for this is to use----------------- as a weight for each of the K-
nearest data instances.
a. The distance to the b. The inverse of the c. The number of points d. The average distance of the
centroid distance to the centroid near the centroid centroid
10 In the cosine similarity measure of document similarity, if the angle between the vectors representing
documents A and B is smaller than the angle between documents A and C, then document A is considered
to be -------------------------- document C.
a. More similar to b. Less similar to c. As similar as d. none of the above
2
Eng. Ahmed Samy TM351 Final 05-01-2019
Part Two: [80 marks]
Question One: [30 marks]
Use the hospital database below to answer the following questions:
Patient (Patient_id, Patient_name, gender, height, age, weight, staff_no, ward_no) Ahmed Samy
Ward (Ward_no , hospital_no, Ward_name, number_of_beds)
Treatment (Staff_no , Patient_no, start_date ,start_day_of_week, reason)
Doctor (Staff_no , Doctor_name , position, specialization, hospital_no)
Hospital(hospital_no, hospital_name, number_of_operation_rooms, city,
city_population, telephone_no)
a. Write SQL statement to list the names of all doctors whose speciallisation is radiology and who
work in the hospital with hospital_no = ‘h102’.
Answer:
Select Doctor_name
From Doctor
Where specialization = ‘radiology’ AND hospital_no = ‘h102’
b. Write SQL statement to list the names and position of all doctors who work in a hospital in [email protected]
London.
Answer:
Select Doctor_name , position
From Doctor , hospital
Where hospital. hospital_no = Doctor. hospital_no AND hospital.city= ‘London’
Another solution:
Select Patient_name , reason
From Hospital INNER JOIN Doctor ON hospital. hospital_no = Doctor. hospital_no
Where hospital.city= ‘London’
c. Write SQL statement to list the city and the total number of operation rooms in the hospitals of that 00965 99941566
city.
Answer:
Select City , SUM(number_of_operation_rooms)
From Hospital
Group By City
d. Write SQL statement to find the total number of beds in Queen Charlotte’s and Chelsea
hospital in London.
Answer:
Select Hospital_name, SUM(Number_Of_Beds)
From Hospital, Ward
Where Hospital.Hospital_no=Ward.Hospital_no AND Hospital.City = ‘London’ AND
Hospital.Hospital_name IN (‘Queen Charlotte’s’,’ Chelsea’)
Group By Hospital_name
3
Eng. Ahmed Samy TM351 Final 05-01-2019
Another solution:
Select Hospital_name, SUM(Number_Of_Beds)
From Hospital Inner Join Ward ON Hospital.Hospital_no=Ward.Hospital_no
Where Hospital.City = ‘London’ AND Hospital.Hospital_name IN (‘Queen Charlotte’s’,’ Chelsea’)
Group By Hospital_name
e. Write SQL statement to update the population of London by adding 10,000 to it.
Answer:
Update Hospital
Set City_Population= City_Population +10,000
Where City = ‘London’
f. To which normal form would the hospital relation confirm if we know that City determines city_ Ahmed Samy
population? Explain why you reached this conclusion and normalize the treatment relation to the
next higher normal form. [email protected]
Answer:
The hospital table is in 2nd normal form because every non-primary key attribute is fully functionally
dependent on the primary key not part of it. And violated the 3rd normal form since (city_ population
depend on City) both non-key columns.
Normalization:
Hospital (hospital_no, hospital_name, number_of_operation_rooms, city, telephone_no)
City (City ,City_Population)
00965 99941566
4
Ahmed Samy [email protected] 00965 99941566
5
TM351 Final 05-01-2019
Eng. Ahmed Samy
Eng. Ahmed Samy TM351 Final 05-01-2019
a. Name the Key(s) that has a list of other documents as its value. Ahmed Samy
Answer
Grades{Date, Grade, Score} [email protected]
b. Name the key(s) that has a subdocument as its value. 00965 99941566
Answer
Address {Building, Coord, Street, ZipCode}
c. Write a MongoDB query to find all documents in the restaurant collection.
Answer
db.Restaurants.find()
d. Write a MongoDB query to find the names of cities that have at least one restaurant in the
collection
Answer
db.resturant.find({ },{ city: 1}
e. Write a MongoDB query to find the names of all restaurant in Paris
Answer
db.restaurants.find( { "City": "Paris" }, { name: 1})
f. Write a MongoDB query to find any one 5-star restaurant in Paris
Answer
db.restaurants.find_one( { "City": "Paris", "stars": 5 }, { name: 1})
g. Write a MongoDB query to find out how many restaurants are there in the collection.
Answer
db.restaurants.find( ).count( )
6
Eng. Ahmed Samy TM351 Final 05-01-2019
Question 3
Below is a depiction of a data cube for student graduation figures at AOU as a
table. Examine this data cube and answer the questions afterwards:
Semester Programme Branch
Fall Business Kuwait KSA Egypt Jordan Ahmed Samy
Fall IT 400 500 350 300
Fall 300 400 250 200
Spring English 200 300 150 100
Spring Business 350 450 300 250
Spring 250 350 200 150
Summer IT 150 250 100 50
Summer English 100 200 150 50
Summer Business 75 150 100 30
25 35 20 15
IT
English
a) Slice the cube by branch and semester, showing only the data for the IT programme and show the
resulting cube as a table.
Semester Branch [email protected]
Kuwait KSA Egypt Jordan
Fall 300 400 250 200
Spring 250 350 200 150
Summer 75 150 100 30
b) Dice the cube by programme, showing only the data for the summer in KSA.
Programme Branch
KSA
Business 200
IT 150
35
English
c) Roll up the data cube to show only the numbers of graduated students per branch. 00965 99941566
Branch
Kuwait KSA Egypt Jordan
1850 1145
2635 1620
7
Eng. Ahmed Samy TM351 Final 05-01-2019
d) Drill down this data cube by adding a learning center dimension. Only show the structure of the
resulting table with all headings (i.e. show an empty table without any numbers). Assume that KSA has
two centers and Jordan has two centers, all others have only 1 center.
Semester Programme Kuwait learning KSA Branch learning Jordan learning
center learning Egypt center center
Fall Business 400 500 center 300
Fall IT 300 400 200 Ahmed Samy
Fall 200 300 350 100
Spring English 350 450 250 250
Spring Business 250 350 150 150
Spring 150 250 300 50
Summer IT 100 200 200 50
Summer English 75 150 100 30
Summer Business 25 35 150 15
100
IT 20
English
[email protected]
00965 99941566
8
Eng. Ahmed Samy TM351 Final 05-01-2019
Question 4
a) Calculate the centroid for the following 4 data points.
P1 = [2 , 3, 4]
P2 = [5, 6, 7]
P3 = [8, 9, 10]
P4 = [11, 12, 13]
Centroid X = + + + = .
Centroid Y = + + + = .
Centroid Z = + + + = . Ahmed Samy
b) In a silhouette analysis, the average distance of point Pi – which is in cluster A to each of 3 existing [email protected]
clusters (A, B, C) were found to be:
Cluster Average distance from Pi, to each cluster member
A5
B 25
C 10
Calculate the silhouette coefficient of point Pi. Explain your steps.
• As Pi has been placed in Cluster A, then a(i) is the average distance from i to each of the other
points in Cluster A, in this case, a(i) = 5.
•b(i) is the average distance from i to each member of the nearest cluster to Pi.
The average distance to Cluster B is 25, and the average distance to Cluster C is 10,
Since the nearest cluster to Pi is cluster C, then b(i) is 10
Since a(i) < b(i)
S(i) = 1 - −= .
00965 99941566
9