EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Write a query that returns the name of each patient that has a mother documented. Also return the patient's mother's name. The PATIENT table has one row per patient The primary key is PAT_ID and holds the patient's ID The MOTHER_PAT_ID column is a foreign key to the PATIENT table and holds the ID of the patient's mother ﴾if documented﴿ We have two entities and a relationship: patient and patient's mother. However, each entity is represented by the PATIENT table. Using the information from this lesson but without aliasing, we may try the following: SELECT PATIENT.PAT_NAME "Patient Name", PATIENT.PAT_NAME "Mother Name" FROM PATIENT INNER JOIN PATIENT ON PATIENT.MOTHER_PAT_ID = PATIENT.PAT_ID However, when we attempt to run this query, we get an error because any reference to a column from the PATIENT table is ambiguous: is it the patient or the patient's mother? The column aliases will ultimately be useful for output, but they don't drive which rows and columns are in the query result. The solution is to alias the tables. SELECT pat.PAT_NAME "Patient Name", mom.PAT_NAME "Mother Name" FROM PATIENT pat INNER JOIN PATIENT mom ON pat.MOTHER_PAT_ID = mom.PAT_ID Join Tables 6 • 25 RPT101i SQL I 151
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Write a query that returns each encounter's CSN, visit provider's name, and general primary care provider's name, where the two providers were identified but not the same. The PAT_ENC table has one row per encounter The PAT_ENC_CSN_ID column holds the encounter CSN The VISIT_PROV_ID column is a foreign key to CLARITY_SER and holds the visit provider's ID The PCP_PROV_ID column is a foreign key to CLARITY_SER and holds the general primary care provider's ID We have three entities and two relationships: encounter, visit provider, PCP; visit provider and PCP. However, two of the entities are represented by the CLARITY_SER table. Using the information from this lesson but without aliasing, we may try the following: SELECT PAT_ENC.PAT_ENC_CSN_ID "CSN", CLARITY_SER.PROV_NAME "Visit Provider", CLARITY_SER.PROV_NAME "PCP" FROM PAT_ENC INNER JOIN CLARITY_SER ON PAT_ENC.VISIT_PROV_ID = CLARITY_SER.PROV_ID INNER JOIN CLARITY_SER ON PAT_ENC.PCP_PROV_ID = CLARITY_SER.PROV_ID WHERE PAT_ENC.VISIT_PROV_ID <> PAT_ENC.PCP_PROV_ID However, when we attempt to run this query, we get an error because any reference to a column from the CLARITY_SER table is ambiguous: is it the visit provider or the PCP? The column aliases will ultimately be useful for output, but they don't drive which rows and columns are in the query result. The solution is to alias the tables. SELECT enc.PAT_ENC_CSN_ID "CSN", visit.PROV_NAME "Visit Provider", pcp.PROV_NAME "PCP" FROM PAT_ENC enc INNER JOIN CLARITY_SER visit ON enc.VISIT_PROV_ID = visit.PROV_ID INNER JOIN CLARITY_SER pcp ON enc.PCP_PROV_ID = pcp.PROV_ID WHERE enc.VISIT_PROV_ID <> enc.PCP_PROV_ID There are additional scenarios when you need to use table aliasing, but they are outside the scope of this lesson. Subqueries, covered in the RPT121i SQL II self‐study, is one such context. 6 • 26 Join Tables 152 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 General Rules to Follow While it is necessary to take care in choosing the proper joins for a query, many joins are simple and common enough that we don't need detailed scrutiny every time we want to add a new table to a query. This section has some general rules to follow: rules that don't apply to every situation, but are followed enough that we should acknowledge their existence. A good use of this section is to check your understanding of the topics covered thus far in this lesson. With these general rules, we'll look at both examples and counterexamples. For N Entities There Will be N‐1 Relationships For all of our entities to be included in one query, the entities need to be related to each other in some way. If we have N entities, then the minimum number of relationships needed to unite them is N‐1. If we have fewer than N‐1 relationships, then there are at least two entities for which there isn't a network of relationships between them. This will result in a Cartesian product, so additional relationships should be identified. If we have more than N‐1 relationships, then there are at least two entities for which there are multiple networks of relationship between them. This will likely result in additional comparisons in the join or the WHERE clause. You need to write a query about patients and their providers. There are two entities ﴾patient and provider﴿ so N=2. Therefore, you need to define N‐1 = 2‐1 = 1 relationship: the relationship between patients and providers you want to query. Entities are Tables A major concept behind many relational database models is that entities are represented by tables. You need to write a query about patients and their providers. The PATIENT table has one row per patient. The CLARITY_SER table has one row per provider. So, the entities are indeed tables. However, the entities we need to query may not match perfectly with tables in our data model. Therefore, we may need to further modify our query ﴾such as adding criteria﴿ to get our desired result from the data model. Depending on the join types and desired results, these additional modifications may need to be included in the join conditions. Join Tables 6 • 27 RPT101i SQL I 153
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 You need to write a query about patients and their current providers. The PatientDim table has one row per patient for a date range. The ProviderDim table has one row per provider for a date range. The entities you need to query don't match our data model perfectly, so you'll need to include some criteria in our query to only get patients' current providers and current data about those providers. You need to list each patient encounter. For each patient encounter, you also need to list the primary encounter diagnosis. The PAT_ENC table has one row per patient encounter The primary key is PAT_ENC_CSN_ID The PAT_ENC_DX table has one row per encounter diagnosis The PRIMARY_DX_YN column has a value of 'Y' for the primary diagnosis The PAT_ENC_CSN_ID column is a foreign key to the PAT_ENC table The entity "patient encounter" is represented by the PAT_ENC table. The entity "primary encounter diagnosis" is represented by the PAT_ENC_DX table with the additional condition of PRIMARY_DX_YN = 'Y'. Relationships are Foreign Keys A major concept behind many relational database models is that entities are represented by tables and relationships are represented by foreign keys. Therefore, the join condition is often of this form: FOREIGN_KEY_COLUMN = PRIMARY_KEY_COLUMN 6 • 28 Join Tables 154 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 You need to list patients and the diagnoses on their problem list. One patient can have multiple diagnoses on their problem list One diagnosis can be on multiple patients' problem lists Therefore, this patient‐diagnosis relationship is many‐to‐many. The PatientDim table has one row per patient for a date range The ProblemComboKey column has a value that corresponds to a group of patients that have the same diagnoses The DiagnosisDim table has one row per diagnosis The DiagnosisBridge table has one row per "diagnosis in a combination" The primary key is DiagnosisComboKey and DiagnosisKey The DiagnosisComboKey column has a value that corresponds to a group of patients that have the same diagnoses The DiagnosisKey column is a foreign key to the DiagnosisDim table In this case, DiagnosisBridge is a table designed specifically for handling this many‐to‐many relationship in an efficient way. The relationship between DiagnosisBridge and DiagnosisDim is a foreign key, so the join condition may look like this: DiagnosisBridge.DiagnosisKey = DiagnosisDim.DiagnosisKey However, against the general rule, the relationship between DiagnosisBridge and PatientDim is not a foreign key, though the join condition does look similar: PatientDim.ProblemComboKey = DiagnosisBridge.DiagnosisComboKey For foreign keys with multiple columns, the conditions are combined with AND : FOREIGN_KEY_COLUMN1 = PRIMARY_KEY_COLUMN1 AND FOREIGN_KEY_COLUMN2 = PRIMARY_KEY_COLUMN2 /* AND ... */ In general, relationships may involve multiple sets of columns and they all need to be included in the join condition. Exercise 7: Multi‐Column Foreign Key In this exercise, you will practice joining database objects on a two‐column foreign key. Background The ROI_AUDIT_TRAIL table has one row per action performed on a release of information The ROI_ID column holds the release of information ID Join Tables 6 • 29 RPT101i SQL I 155
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 This is primary key column 1 of 2 The LINE column holds the chronological action number ﴾1, 2, 3, ...﴿ of the action This is primary key column 2 of 2 The DEST_PRINTER column holds the printer name to which copies were sent The V_ROI_STATUS_HISTORY view returns one row per status change of a release of information The ROI_ID column returns the release of information ID This is foreign key column 1 of 2 to ROI_AUDIT_TRAIL The END_LINE column returns the chronological action number that changed the release of information status This is foreign key column 2 of 2 to ROI_AUDIT_TRAIL The END_DTTM column returns the date and time of the status change The END_USER_NM_WID column returns the name and ID of the user who changed the status Only some actions change the status, but all status changes are driven by actions. Task Write a query that returns the release of information ID ﴾ROI﴿, date and time, user, and printer of every release of information status change. Hint: every status change will have a corresponding action, so which join type should be used? Inner join Join in the ...‐to‐1 Direction Foreign keys written in the order "table with a foreign key to another table" are inherently "...‐to‐1" relationships, but this general rule can be generalized to other relationships. The general rule "Join in the ...‐to‐1 Direction" is based on the concept of granularity, introduced in the Granularity section of the Logical Expressions lesson. In a ...‐to‐1 relationship, the left table is at least as granular as the right table. So, this general rule can be summarized as "Join in order of decreasing granularity." In general, joining tables can change the granularity of the result, but joining in order of decreasing granularity can make the query writing process simpler. 6 • 30 Join Tables 156 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 You want to create a list of patients and their providers. The PATIENT table has one row per patient The primary key is PAT_ID The CUR_PCP_PROV_ID column is a foreign key to CLARITY_SER with a "Zero or More to Zero or One" relationship The CLARITY_SER table has one row per provider The primary key is PROV_ID Assume you'll join the tables ON PATIENT.CUR_PCP_PROV_ID = CLARITY_SER.PROV_ID . In this example, the granularity of... the PATIENT table is PAT_ID or "patient" the CLARITY_SER table is PROV_ID or "provider" a query with an INNER JOIN would be PAT_ID or "patient" a query with PATIENT LEFT OUTER JOIN CLARITY_SER would be PAT_ID or "patient": each patient has at most one provider, so the join to CLARITY_SER won't change the number of rows or granularity a query with CLARITY_SER LEFT OUTER JOIN PATIENT would be "PROV_ID and PAT_ID" or "provider‐patient combo": the granularity isn't PROV_ID because some providers will have multiple rows, and the granularity isn't PAT_ID because some rows will have a provider but no patient This last case is interesting in that the choice of join type and table order resulted in a more detailed granularity. It can be helpful to think of the FROM clause as an iterative process. 1. First, there is one table and the query result is a table with the same granularity. 2. Then, another table is added to the FROM clause with a join. The query result is a table with a potentially new granularity. 3. Then, another table is added to the FROM clause with a join. The query result is the previous query result ﴾as a table﴿ joined with the additional table. Thus, the query result is another table with a potentially new granularity. This process is repeated with each additional table to the FROM clause. Each time, we're just joining one table ﴾the previous result﴿ with the additional table to produce a new result. Each time, the granularity of the result may change. Joining in the "...‐to‐many" direction changes the granularity of the query result. In particular, the granularity increases by a factor of whatever the "many" represents. This therefore changes the cardinality Join Tables 6 • 31 RPT101i SQL I 157
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 of the other relationships that haven't been added to the query yet. The result can be a Cartesian product. This can be avoided by joining in the "...‐to‐1" direction. Joining in the "...‐to‐1" direction does not change the granularity of the result. Most Granular Table First A conclusion of the general rule "Join in the ...‐to‐1 Direction" is that the table with the highest granularity should be listed first in your FROM clause. This is often the table that has the same granularity as your desired query result. If your most granular table isn't granular enough, this may indicate that you need to add a more granular table to your query. If your most granular table is more granular than your desired result, then additional adjustments need to be made to your query. This could be adding a condition. Making your query less granular can also be accomplished by adding grouping, covered in the Introduction to Grouping lesson. If multiple tables appear to be the most granular, then you may have a Cartesian product. Determine what you want the granularity of the query result to be and revisit your entities and relationships accordingly. 6 • 32 Join Tables 158 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 You need to write a query that, for each patient encounter, lists the reasons for visit and encounter diagnoses. The PAT_ENC table has one row per patient encounter The PAT_ENC_RSN_VISIT table has one row per encounter reason for visit This encounter‐reason relationship is 1‐to‐many The PAT_ENC_DX table has one row per encounter diagnosis This encounter‐diagnosis relationship is 1‐to‐many An encounter diagnoses is associated with the encounter, not with a reason for visit Both the PAT_ENC_RSN_VISIT and PAT_ENC_DX tables are the most granular, but they don't have the same granularity. In other words, there is a many‐to‐many relationship between these two tables. This may be a time to revisit the query specifications, which in this case aren't specific enough to determine what the granularity of our query result should be. Either of these potential statements would resolve this issue because it would redefine one of our entities and therefore the relationships. Only list the first reason for visit, represented by PAT_ENC_RSN_VISIT.LINE = 1 Only list the primary diagnosis, represented by PAT_ENC_DX.PRIMARY_DX_YN = 'Y' If a granularity of "reason for visit or encounter diagnosis" is required, SQL syntax outside the scope of this lesson is required. Joining tables essentially merges two tables in a way that considers the relationship between the rows of those tables. The UNION operator can be used to merge two tables by stacking the tables vertically: one table's rows followed by the other table's rows. UNION s are covered in the RPT121i SQL II self‐study. INNER JOINs Then LEFT OUTER JOINs Tables need to appear in the FROM clause in an order that allows for adding on each relationship, one by one. Within these bounds, as long as the INNER JOIN s are listed first, the order of the INNER JOIN s doesn't influence the query result. Moreover, if all of the INNER JOIN s are listed first, then the order of the LEFT O UTER JOIN s to follow won't influence the query result either. Suppose a table is on the right side of a LEFT OUTER JOIN in the FROM clause. Recall that any logical expressions on the right side table of the join need to be included in the JOIN condition ﴾not the WHERE clause﴿ for the LEFT OUTER JOIN to be respected. We run into the same problem if the table is referenced again as part of an INNER JOIN : an INNER JOIN using a column of a table that has been added to a query Join Tables 6 • 33 RPT101i SQL I 159
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 via a LEFT OUTER JOIN will negate the LEFT OUTER JOIN . In summary, listing all of the INNER JOIN s first makes the query easier to interpret and maintain. Exercise 8: Using General Rules In this exercise you'll practice using the general rules listed in this section to write a query using multiple joins. Background The EncounterFact table has one row per patient encounter The EncounterKey column is the primary key This will be greater than 0 for valid encounters The EncounterEpicCsn column holds the CSN of the encounter The PatientKey column is a foreign key to PatientDim The ProviderKey column is a foreign key to ProviderDim The PatientDim table has one row per patient per date range The PatientKey column is the primary key The PrimaryCareProviderKey column is a foreign key to ProviderDim The ProviderDim table has one row per provider per date range The ProviderKey column is the primary key This will be greater than 0 for valid providers All foreign key columns are populated Task For each patient encounter that has a valid provider, return the Patient's Name CSN Encounter Provider's Name PCP's ﴾Primary Care Provider's﴿ Name ﴾regardless of whether or not the PCP is a valid provider﴿ Step‐by‐Step We have four entities: Patient encounter Patient Encounter Provider PCP 6 • 34 Join Tables 160 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Therefore, there will be three relationships: A patient encounter is for one patient A patient encounter has one encounter provider A patient has one PCP at a time For each entity, we have a corresponding table ﴾one table shows up twice, so we'll give it an alias right away﴿: EncounterFact PatientDim ProviderDim "pcp" ProviderDim "encprov" For each relationship, we have a corresponding foreign key that we can use in our joins: EncounterFact.PatientKey = PatientDim.PatientKey EncounterFact.ProviderKey = encprov.ProviderKey PatientDim.PrimaryCareProviderKey = pcp.ProviderKey Each of the relationships listed above is in the many‐to‐one direction, so we'll try join the tables in that order: EncounterFact before PatientDim EncounterFact before "pcp" PatientDim before "encprov" It looks like EncounterFact will be our first table based on: the order of the tables in each relationship listed above the fact that it is the most granular table the fact that its granularity ﴾patient encounter﴿ matches that of our desired query output Based on the cardinalities of the relationships and the fact that all of the foreign keys will be populated, no left outer joins are necessary. To return only valid encounters and encounter providers, filter on the EncounterKey and the ProviderKey columns. Join Tables 6 • 35 RPT101i SQL I 161
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Reviewing the Chapter Review Questions 1. Which join type is typically the most efficient? Choose only ONE answer. A. Cross join B. Inner join C. Left outer join 2. For which join type does the order you list the tables in the FROM clause matter? Choose only ONE answer. A. Cross join B. Inner join C. Left outer join 3. For which relationship cardinalities would you expect the same query result regardless of join type ﴾inner vs. left outer﴿ or join direction? Review Key 6 • 36 Join Tables 162 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Review Key 1. Which join type is typically the most efficient? Choose only ONE answer. A. Cross join B. Inner join C C. Left outer join Left outer join B. Inner join. 2. For which join type does the order you list the tables in the FROM clause matter? Choose only ONE answer. A. Cross join B. Inner join C. Left outer join C. Left outer join. 3. For which relationship cardinalities would you expect the same query result regardless of join type ﴾inner vs. left outer﴿ or join direction? * One to Exactly One * One to One or More * One or More to One * One or More to One or More Study Checklist Join Tables 6 • 37 RPT101i SQL I 163
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Study Checklist Make sure you can define the following key terms: ☐ Table alias ☐ Join ☐ Join type ☐ LEFT OUTER JOIN keyword ☐ INNER JOIN keyword ☐ Cartesian product ☐ Cardinality of a relationship Make sure you can perform the following tasks: ☐ Use table aliases in a SQL query ☐ Recognize which join type must be used ☐ Recognize the need for a left outer join ☐ Recognize the need for an inner join ☐ Use a left outer join ☐ Use an inner join ☐ Recognize the need for multiple join criteria ☐ Use multiple join criteria ☐ Use multiple joins Make sure you fully understand and can explain the following concepts: ☐ How joining tables may change the granularity of a SQL query result ☐ A table alias must be used when two or more of the same table appear in the FROM clause ☐ Joins using multi‐column foreign keys require multiple join criteria ☐ Left outer joins restricting the results of only the right table of a left outer join require multiple join criteria ☐ Why a Cartesian product should be avoided 6 • 38 Join Tables 164 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Join Tables 6 • 39 RPT101i SQL I 165
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Introduction to Grouping Introduction Grouping Example Counting Exercise 1: Counting Rows without Aggregate Functions Exercise 2: Counting Rows with Aggregate Functions Exercise 3: Counting Rows and Filtering Summing Exercise 4: Summing Rows Grouping Exercise 5: Attempting to Return Detail and Aggregated Data Exercise 6: Counting and Summing Exercise 7: Grouping and Filtering Selecting Ungrouped Columns Exercise 8: Attempting to Select an Ungrouped Column Adding an Extra Layer of Grouping Exercise 9: Group by ID and Name Using the MAX Aggregate Function Exercise 10: SELECT MAX﴾ Name ﴿ Additional Exercises Exercise 11: Grouping on Multiple Columns Exercise 12: Grouping and Ordering Reviewing the Chapter 7 • 3 7 • 4 7 • 5 7 • 5 7 • 5 7 • 6 7 • 7 7 • 7 7 • 8 7 • 8 7 • 9 7 • 11 7 • 13 7 • 13 7 • 14 7 • 14 7 • 14 7 • 16 7 • 17 7 • 17 7 • 17 7 • 18 7 • 1 Introduction to Grouping 166 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Introduction to Grouping 7 • 2 RPT101i SQL I 167
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Introduction Some queries require searching through detailed information but only need to return summary information. This lesson explores how the use of aggregate functions and GROUP BY clause can meet these needs. By the End of This Lesson, You Will Be Able To... Write a query that returns the count of rows Write a query that returns the sum of a column's values across rows Group query results 7 • 3 Introduction to Grouping 168 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Grouping Example Hospital accounts in the database: HAR Provider ID Provider Name Balance 301 TRN080 WHITECOAT, WALT 8759.56 715 TRN061 MCQUEENIE,DIANA 171.00 852 TRN080 WHITECOAT, WALT 3831.36 853 TRN082 STITCH, MARTIN 8427.51 4690 TRN080 WHITECOAT, WALT 3256.66 5359 TRN061 MCQUEENIE, DIANA 802.24 Hospital accounts summarized: Provider ID Provider Name Balance Sum Count TRN061 MCQUEENIE, DIANA 973.24 2 TRN080 WHITECOAT, WALT 15847.48 3 TRN082 STITCH, MARTIN 8427.51 1 Introduction to Grouping 7 • 4 RPT101i SQL I 169
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Counting Exercise 1: Counting Rows without Aggregate Functions In this exercise, you'll review how you as the individual running a SQL query can see a count of rows. Background The PAT_ENC_HSP table has one row per hospital encounter Task Write a query that returns one row for each hospital encounter. Run the query. 1. How many hospital encounters are there in the database? Answers vary depending on when and in which database the query is run 2. Where on the screen did you find this information? Lower right hand corner, below the results This is the end of the exercise. When running a query in SQL Management Studio or Oracle SQL Developer, the number of rows returned is displayed on the screen along with the query result. However, if the purpose of a query is to return the number of rows, then this query has failed on two counts: The number of rows is not actually part of the query results, so it may not be easily processed outside of SQL Management Studio or Oracle SQL Developer. The query is returning more information than it needs to, so it is not as efficient as it could be. The COUNT function is a solution to this problem. Exercise 2: Counting Rows with Aggregate Functions In this exercise, you'll learn how to use the COUNT function to return the number of rows as the query result. Background The PAT_ENC_HSP table has one row per hospital encounter Task Run the following query: SELECT COUNT( * ) FROM PAT_ENC_HSP 1. How many hospital encounters are there in the database? 7 • 5 Introduction to Grouping 170 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Answers vary depending on when and in which database the query is run, but should be the same answer as Exercise 1: Counting Rows without Aggregate Functions 2. Where on the screen did you find this information? In the query result 3. How many rows are there in your query result? One This is the end of the exercise. The COUNT function is an example of an aggregate function. Aggregate functions operate on a group of rows. Aggregate functions differ from the functions in the Functions lesson in that they take many rows' worth of input and return one output: they summarize a group of rows. For now, the "group" is all rows that the query would normally return. COUNT( * ) returns the number of rows that would normally return. Technically, the * is all of the columns and the COUNT function counts each row where at least one of the columns is populated. At least one of the columns will be populated ﴾in particular, a primary key column﴿ so COUNT( * ) counts all rows. Understanding this can be useful when you learn about more aggregate functions in this lesson and in the RPT121i SQL II self‐study. Exercise 3: Counting Rows and Filtering In this exercise, you'll practice using the COUNT function and filtering. Background The CLARITY_SER table has one row per provider The STAFF_RESOURCE column indicates if a row represents a Class, Person, or Resource A value of 'Person' indicates that a row represents a person Task Write a query to return the number of providers that are people. Introduction to Grouping 7 • 6 RPT101i SQL I 171
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Summing The SUM aggregate function allows you to add up the values of a column across rows. NULL values are ignored. The view V_ARPB_RVU_DATA returns one row per professional billing charge The AMOUNT column holds the dollar amount ﴾which could be negative for voids﴿ The RVU_TOTAL column holds the total Relative Value Units ﴾RVUs﴿ for the procedure The query SELECT SUM( AMOUNT ) "Total Charges", SUM( RVU_TOTAL ) "Total RVUs" FROM V_ARPB_RVU_DATA returns the total dollar amount and RVUs for all professional billing charges. Exercise 4: Summing Rows In this exercise, you'll practice using the SUM function. Background The BillingAccountFact table has one row per hospital account The TotalAccountBalance column holds the current balance The TotalAdjustmentAmount column holds the total of all adjustments The TotalChargeAmount column holds the total of all charges The TotalPaymentAmount column holds the total of all payments Task Write a query that summarizes all hospital accounts by returning four grand totals, one for each of the following: Current balances Adjustments Charges Payments 7 • 7 Introduction to Grouping 172 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Grouping Exercise 5: Attempting to Return Detail and Aggregated Data In this exercise, you'll learn the need for additional syntax to return both normal values and aggregated values. Background The PATIENT table has one row per patient The SEX_C column indicates the sex of a patient Task In an attempt to determine the number of patients of each sex, run the following query: SELECT SEX_C, COUNT( * ) FROM PATIENT What error is returned? Column 'PATIENT.SEX_C' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause The problem essentially lies in the fact that these two queries return a different number of rows: SELECT SEX_C FROM PATIENT ‐‐Returns 1 row per patient SELECT COUNT( * ) FROM PATIENT ‐‐Returns 1 row The result of a SQL query must be a table with a well‐defined number of rows. SQL can't reconcile this with the clauses we've used so far. What we actually want is a hybrid: a query that returns one row per unique value of SEX_C, with one column returning that SEX_C value and another column returning a count of each row that has that value. You will explore the solution to this problem and the resolution to this error in this section. This is the end of the exercise. The GROUP BY clause allows you to specify the granularity of a query result, thereby reducing the granularity of a query. Review The SELECT Statement section of the Write a Basic Query lesson as a reminder of where the GROUP BY clause appears and when it is processed in relation to the other clauses. Introduction to Grouping 7 • 8 RPT101i SQL I 173
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The PATIENT table has one row per patient The CITY column holds the name of the city in which a patient resides You need to write a query to return one row per distinct city in the patient table. The following query returns the required data, but has one row per patient, not one row per city: SELECT CITY FROM PATIENT You can change the granularity of the query by adding the GROUP BY clause: SELECT CITY FROM PATIENT GROUP BY CITY The main benefit of reducing the granularity of your query result is that the reduced columns can be summarized. The PATIENT table has one row per patient The CITY column holds the name of the city in which a patient resides You need to write a query to return one row per distinct city in the PATIENT table, along with how many patients reside in that city. SELECT CITY, COUNT( * ) "Count" FROM PATIENT GROUP BY CITY In this case, we don't get the error you encountered in Exercise 5: Attempting to Return Detail and Aggregated Data because each column in the SELECT clause is an aggregate function ﴾ COUNT ﴿ or is also in the GROUP BY clause ﴾CITY﴿. Exercise 6: Counting and Summing In this exercise, you'll practice grouping, counting, and summing. Background The BillingAccountFact table has one row per billing account The TotalAccountBalance column holds the current balance The HasOpenDenial column indicates whether the account has an open denial 0 indicates there is no open denial 1 indicates there is an open denial 7 • 9 Introduction to Grouping 174 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 NULL indicates this is a professional account Task Write a query that returns the total balance and number of billing accounts with and without denials. The query should return two rows: one for accounts with denials and one for accounts without denials. Professional accounts should not be in the results. This is the end of the exercise. When grouping data, it may be useful to sort by aggregated data. This is done in the same way as it is for any columns: in the ORDER BY clause with the option to use column aliases. The PATIENT table has one row per patient The CITY column holds the name of the city in which a patient resides You need to write a query to return one row per distinct city in the PATIENT table. Sort with the most popular city listed first. SELECT CITY FROM PATIENT GROUP BY CITY ORDER BY COUNT( * ) DESC If you want to both return and sort by COUNT﴾ * ﴿, you can give it an alias and reference it in the ORDER BY clause. SELECT CITY, COUNT( * ) "Count" FROM PATIENT GROUP BY CITY ORDER BY "Count" DESC The GROUP BY clause is not limited to one column: it allows for a comma‐delimited list of columns. The combination of the columns define the new granularity of the query result. Introduction to Grouping 7 • 10 RPT101i SQL I 175
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The PATIENT table has one row per patient The CITY column holds the name of the city in which a patient resides The STATE_C column holds the ID of the state in which a patient resides The query from a previous example needs to be updated to account for city names that are the same across state lines. You need to write a query to return one row per distinct city/state in the PATIENT table along with the state ID and how many patients reside in that city. SELECT CITY, STATE_C, COUNT( * ) "Count" FROM PATIENT GROUP BY CITY, STATE_C In this case, we don't get the error you encountered in Exercise 5: Attempting to Return Detail and Aggregated Data because each column in the SELECT clause is an aggregate function ﴾ COUNT ﴿ or is also in the GROUP BY clause ﴾CITY﴿. The term group ﴾and its variations grouped, grouping, grouper, etc.﴿ is popular in the world of reporting. Group therefore may have a different definition in different contexts. For example, in Crystal Reports, grouping means sorting the data by a field's value: detailed data can still be shown, and aggregated data can be shown any time the field's value changes between one row and the next. In contrast, SQL grouping only allows aggregated data to be returned. When communicating with others outside the context of SQL, be clear about what is meant by the phrase group. Exercise 7: Grouping and Filtering In this exercise, you'll practice grouping on multiple columns. You also explore the interaction of grouping and filtering together. Background The PATIENT table has one row per patient The CITY column holds the name of the city in which a patient resides The STATE_C column holds the ID of the patient's state A value of '50' indicates Wisconsin The SEX_C column holds the ID of the patient's sex A value of '1' indicates Female 7 • 11 Introduction to Grouping 176 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Task Write a query to return one row per city where a patient resides, along with the state ID and how many patients reside in that city. Now attempt to modify your query to answer the following questions: 1. Can you filter by STATE_C in your WHERE clause? Why or why not? Yes. The STATE_C column is in the PATIENT table, which is in the FROM clause, which is effectively processed before the WHERE clause. 2. Can you filter by SEX_C in your WHERE clause? Why or why not? Modify your answer to the previous question if necessary. Yes. Same answer. 3. Can you filter by COUNT( * ) in your WHERE clause? Why or why not? No. The GROUP BY clause (and therefore aggregation) is processed after the WHERE clause. This is a reminder that the GROUP BY clause is effectively processed after the WHERE clause. To apply filtering after the GROUP BY clause, you'll need to use the HAVING clause which is covered in the RPT121i SQL II training companion. Introduction to Grouping 7 • 12 RPT101i SQL I 177
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Selecting Ungrouped Columns Sometimes you'll need to add a column to your SELECT clause for neither grouping nor aggregation. Exercise 8: Attempting to Select an Ungrouped Column In this exercise, you'll practice grouping. You'll also be introduced to the problem that arises when adding an ungrouped column to the SELECT clause. Background The view V_PAT_FACT has one row per patient The column CUR_PCP_PROV_ID holds the ID of the patient's current general PCP The column CUR_PCP_NAME holds the name of the patient's current general PCP Task Write a query that returns how many patients are assigned to each current general PCP. Return the provider's ID and a count. Follow‐Up Task You've been asked to add the provider's name to the query. 1. What happens if you add the provider's name to the SELECT clause? A similar error as in Exercise 5: Attempting to Return Detail and Aggregated Data: Column 'V_PAT_FA CT.CUR_PCP_NAME' is invalid in the select list because it is not contained in either an aggre gate function or the GROUP BY clause. Although it may be obvious to us that an ID will only have one associated name, the SELECT clause doesn't have this information. This is one of the consequences of the SELECT clause being processed after the GROUP BY clause. The GROUP BY clause changes our working result set enough that the SELECT clause is cut off from that previous information. This is the end of the exercise. There are two popular methods to solve this problem: Adding the column you want to return to the GROUP BY clause Using the MAX aggregate function on the column you want to return Both methods are valid and will return the same data. Which method you choose will depend on style preference and performance, so give them both a shot! 7 • 13 Introduction to Grouping 178 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 These methods describe grouping on an ID column while attempting to return the corresponding name column. In particular, how can we modify this query... SELECT ID, NAME, COUNT( * ) "Count" /*...*/ GROUP BY ID ...so it returns the ID, name, and count for each ID? While this is a common scenario, these methods work just as well for any other relationship similar to ID‐to‐name ﴾...‐to‐1﴿. Beware that both of these methods return data even if you don't choose columns with this relationship, so know your column relationships well. Adding an Extra Layer of Grouping If we know that an ID value has at most one corresponding name value, then adding the name column to a GROUP BY clause should have no effect on the granularity of the query result. SELECT ID, NAME, COUNT( * ) "Count" /*...*/ GROUP BY ID, NAME Solved! Exercise 9: Group by ID and Name In this exercise, you'll practice adding an extra column to the GROUP BY clause. Background The view V_PAT_FACT has one row per patient The column CUR_PCP_PROV_ID holds the ID of the patient's current general PCP The column CUR_PCP_NAME holds the name of the patient's current general PCP Task Write a query that returns how many patients are assigned to each current general PCP. Return the provider's ID, name, and a count by using multiple columns in the GROUP BY clause. Using the MAX Aggregate Function Suppose an ID value has at most one corresponding name value. What would be the result of a query that... Introduction to Grouping 7 • 14 RPT101i SQL I 179
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 1. Has no aggregate functions 2. Has no GROUP BY clause 3. Includes both of these columns ...look like? For any row with the same ID value, the name value would always be the same or always NULL . The result of a query that returns one row per patient encounter may look like this: Provider ID Provider Name 21699 MARBLE, PAT 21699 MARBLE, PAT 222100 MOTLEY, HARPER 222100 MOTLEY, HARPER 222100 MOTLEY, HARPER 233400 NULL 233400 NULL TRN080 WHITECOAT, WALT In either case, the name column would be simple to summarize: "the name" or NULL , as appropriate. Is there an aggregate function that will do this for us? Yes: the MAX aggregate function! The MAX ﴾maximum﴿ aggregate function traditionally is used to find the maximum numeric value, but for a column that's always the same value or NULL it will return that value or NULL as appropriate. ﴾Technically, either of the MAX and MIN ﴾minimum﴿ aggregate functions will work equally well for this task, but MAX is the more conventional choice.﴿ 7 • 15 Introduction to Grouping 180 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Continuing the previous example, grouping by 'Provider ID' and adjusting the other columns would yield the following result: Provider ID MAX﴾ "Provider Name" ﴿ COUNT﴾ * ﴿ 21699 MARBLE, PAT 2 222100 MOTLEY, HARPER 3 233400 NULL 2 TRN080 WHITECOAT, WALT 1 SELECT ID, MAX( NAME ) "Name", COUNT( * ) "Count" /*...*/ GROUP BY ID Solved! Exercise 10: SELECT MAX﴾ Name ﴿ In this exercise, you'll practice using the MAX function to avoid adding an extra column to the GROUP BY clause. Background The view V_PAT_FACT has one row per patient The column CUR_PCP_PROV_ID holds the ID of the patient's current general PCP The column CUR_PCP_NAME holds the name of the patient's current general PCP Task Write a query that returns how many patients are assigned to each current general PCP. Return the provider's ID, name, and a count by using multiple aggregate functions in the SELECT clause. Introduction to Grouping 7 • 16 RPT101i SQL I 181
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Additional Exercises Exercise 11: Grouping on Multiple Columns In this exercise, you'll practice grouping on multiple columns. Background The PatientDim table has one row per patient per date range The City column holds the name of the city in which a patient resides The StateOrProvince holds the name of the state in which a patient resides The IsCurrent column stores 1 on any row that stores current patient data, and 0 otherwise Task Write a query that counts patients by their current city. Return the city name, the state name, and the number of patients that live in that city. Include patients that don't have a city and/or state listed. Save your query so you can build on it in Exercise 12: Grouping and Ordering. Exercise 12: Grouping and Ordering In this exercise, you'll practice grouping and sorting. This exercise is a reminder that the GROUP BY clause is processed before the ORDER BY clause. Task Build on to your solution for Exercise 11: Grouping on Multiple Columns and sort the results with the most popular city at the top. 7 • 17 Introduction to Grouping 182 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Reviewing the Chapter Review Questions 1. How many rows will be returned by a query that has no GROUP BY clause but has a SELECT clause with aggregate functions? Choose only ONE answer. A. Zero B. One C. One for each unique combination of values of the columns used by the aggregate functions D. The same number of rows as if the SELECT clause had no aggregate functions 2. How many rows will be returned by a query that has a GROUP BY clause? 3. You need to write a query to group by an ID column and then return the corresponding name column along with a count. What are two options for returning the name column? Review Key Introduction to Grouping 7 • 18 RPT101i SQL I 183
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Review Key 1. How many rows will be returned by a query that has no GROUP BY clause but has a SELECT clause with aggregate functions? Choose only ONE answer. A. Zero B. One C. One for each unique combination of values of the columns used by the aggregate functions D. The same number of rows as if the SELECT clause had no aggregate functions B. One. 2. How many rows will be returned by a query that has a GROUP BY clause? One row for each unique combination of values of the columns listed in the GROUP BY clause. 3. You need to write a query to group by an ID column and then return the corresponding name column along with a count. What are two options for returning the name column? * GROUP BY ID, NAME * SELECT MAX( NAME ) Study Checklist 7 • 19 Introduction to Grouping 184 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Study Checklist Make sure you can define the following key terms: ☐ Aggregate function ☐ COUNT( * ) ☐ SUM aggregate function ☐ GROUP BY clause Make sure you can perform the following tasks: ☐ Count the total number of rows in a SQL query result ☐ Group the results of a SQL query ☐ Count the number of rows in each group in a SQL query ☐ Sum a column value for all rows in a SQL query result ☐ Sum a column value for each group in a SQL query ☐ Return aggregated data ☐ Return columns after grouping ☐ Return non‐grouped columns using MAX or MIN ☐ Return columns by adding an extra layer of grouping ☐ Sort by aggregated data Make sure you fully understand and can explain the following concepts: ☐ What aggregation does ☐ The GROUP BY clause is processed after the WHERE clause but before the SELECT and ORDER BY clauses ☐ Why the SELECT clause can only contain aggregate functions or columns in the GROUP BY clause ☐ How the GROUP BY clause changes the granularity of a SQL query result Introduction to Grouping 7 • 20 RPT101i SQL I 185
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Performance Introduction Improving SQL Query Performance Workflow Use Fewer Rows While Developing SELECT TOP ﴾SQL Server﴿ WHERE ROWNUM ﴾Oracle﴿ Exercise 1: SELECT 100 Rows Use Indexes Wisely Determine Indexed Columns SP_HELPINDEX ﴾SQL Server﴿ ALL_IND_COLUMNS ﴾Oracle﴿ Exercise 2: Determine Indexes Use Indexed Columns Avoid Functions Keep it Simple Reviewing the Chapter 8 • 3 8 • 4 8 • 5 8 • 5 8 • 5 8 • 6 8 • 7 8 • 8 8 • 8 8 • 9 8 • 10 8 • 10 8 • 15 8 • 17 8 • 19 8 • 1 Performance 186 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Performance 8 • 2 RPT101i SQL I 187
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Introduction By now you've experienced how to use the main parts of a SELECT statement. Armed with this information, you can tackle many relational database querying needs. Have you noticed that there may be several valid ways of writing a query that will produce the result you want? Some examples: <= AND >= operators vs. BETWEEN operator CASE statement testing for NULL vs. COALESCE function Concatenation operator vs. CONCAT function TableOne INNER JOIN TableTwo vs. TableTwo INNER JOIN TableOne GROUP BY Name vs. SELECT MAX( Name ) So, what makes one query better than another if they produce the same result? Style and maintainability are considerations, but your database administrator will tell you, "Performance." You want your queries to perform efficiently so that they finish faster and your database is freed up to process other queries. This lesson introduces some concepts you should consider when thinking about query performance. By the End of This Lesson, You Will Be Able To... Limit the performance impact during query development Identify indexed columns 8 • 3 Performance 188 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Improving SQL Query Performance Workflow Recall that SQL is a declarative programming language, meaning that you use it to describe the results you seek without actually specifying how to go about getting those results. When implemented, the topics covered in this lesson will give the database an opportunity to improve performance, but there is no guarantee that performance will actually improve. Ultimately, the way to get the best performing query is to: 1. Write a query 2. Rewrite the query in several different ways 3. Compare how each of the queries performs 4. Choose the query that performs best Previous lessons focus on the first step. This lesson focuses on the second step: how to rewrite a query in a way that performance might improve? Formally comparing two queries is beyond the scope of this lesson, though informally you'll be able to make an educated guess on which of two queries will perform better. Performance 8 • 4 RPT101i SQL I 189
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Use Fewer Rows While Developing While developing and testing a query, returning a complete result set may be unnecessary. Reducing the number of rows returned by a query during testing is good practice because it may reduce the impact on the database, freeing up resources for other queries and processes. The method to limit the number of rows returned by a query is different between SQL Server and Oracle. Review the Microsoft SQL Server and Oracle section of the Introduction lesson for a reminder of what is expected on the assessments. SELECT TOP ﴾SQL Server﴿ In SQL Server, the SELECT TOP statement can be used to limit results to a number of rows or a percentage of total rows. After the keyword TOP include a number and optionally follow that with the keyword PERCEN T . Then add what you would normally add to the SELECT clause and the rest of the SELECT statement. SELECT TOP 5 * FROM PATIENT would return five rows from the PATIENT table whereas SELECT TOP 1 PERCENT * FROM PATIENT would return one percent of the rows in the PATIENT table. The SELECT and SELECT TOP statements are technically different statements. However, it can be useful to think of these as the same statement where the optional TOP keyword is evaluated after the ORDER BY clause. The PATIENT table has one row per patient The BIRTH_DATE column holds the patient's birth date The PAT_MRN_ID column holds the patient's medical record number ﴾MRN﴿ The query SELECT TOP 5 PAT_MRN_ID, BIRTH_DATE FROM PATIENT ORDER BY BIRTH_DATE DESC would return the MRN and birth date of the five youngest patients. WHERE ROWNUM ﴾Oracle﴿ In Oracle, the ROWNUM pseudocolumn can be used to limit results to a number of rows. In the WHERE clause, include a condition ROWNUM < N where the number of rows desired is N‐1. 8 • 5 Performance 190 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 SELECT * FROM PATIENT WHERE ROWNUM < 6 would return five rows from the PATIENT table. Keep in mind that the WHERE clause is processed before the ORDER BY clause. The PATIENT table has one row per patient The BIRTH_DATE column holds the patient's birth date The PAT_MRN_ID column holds the patient's medical record number ﴾MRN﴿ The query SELECT PAT_MRN_ID, BIRTH_DATE FROM PATIENT WHERE ROWNUM < 6 ORDER BY BIRTH_DATE DESC would return the MRN and birth date of five patients, but not necessarily the five youngest patients. The patients selected will be listed in youngest‐to‐oldest order. Exercise 1: SELECT 100 Rows In this exercise, you'll practice limiting the number of rows returned by a query. Background The CLARITY_SER table has one row per provider The PROV_ID column holds the provider's ID The PROV_NAME column holds the provider's name Task Write a query to list the IDs and names of 100 providers. Performance 8 • 6 RPT101i SQL I 191
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Use Indexes Wisely An index is an additional data structure that is a copy of a portion of the original data structure sorted in a different order. Using an index may make a task faster and more efficient. Think of an index at the back of a textbook: a list of words in alphabetical order along with page numbers. The words and the page numbers exist elsewhere in the original text, but they are sorted differently in the index. The index makes it easier and faster to find specific words or more information about those words in the original text. A book's publisher needs to decide if there should be an index, what words should be in the index, and if there should be different types of indexes. The decisions are based on the book type, intended audience, expected uses, and work involved in making and storing the indexes. An 80's Greatest Hits songbook may benefit from title, first line, and artist indexes whereas a novel may only need an index of chapter names. In SQL, an index is a set of one or more sorted columns of a table, along with a reference to the corresponding row. A column is said to be indexed if it is part of an index. Typically, only some columns of a table are indexed. The PAT_ENC table has one row per patient encounter The PAT_ENC_CSN_ID column is the primary key The CONTACT_DATE column holds the date of the encounter This column is indexed While not completely accurate, for many purposes the CONTACT_DATE index can be thought of as the result of the following query: SELECT CONTACT_DATE, PAT_ENC_CSN_ID FROM PAT_ENC ORDER BY CONTACT_DATE This result is physically stored on the database, just like the original PAT_ENC table. If you later run a query that uses the CONTACT_DATE ﴾such as filtering by a specific date﴿, SQL may be able to use the index to more quickly find the corresponding rows in the PAT_ENC table that should be included in your query result. To intentionally encourage your query to use an index and therefore potentially be more efficient, you first must identify the indexed columns of the tables in your query. Then, include an indexed column in your query in a way that it may benefit from using the index. 8 • 7 Performance 192 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Determine Indexed Columns Primary key columns are inherently indexed, but for other columns we need to look up if they are indexed. The method to determine indexed columns of a table is different between SQL Server and Oracle. Review the Microsoft SQL Server and Oracle section of the Introduction lesson for a reminder of what is expected on the assessments. SP_HELPINDEX ﴾SQL Server﴿ In SQL Server, the SP_HELPINDEX stored procedure can be used to list the indexes of a table. Run SP_HELP INDEX table_name where table_name is the name of the table for which you want to list the indexes. The result of the stored procedure is a table that includes three columns: index_name holds the name of the index index_description holds the type of index and other information about how the index is implemented index_keys holds the list of columns that are included in the index For the scope of this lesson, what's important is to be able to determine whether or not a column is indexed. A column is indexed if and only if it is listed in the index_keys column returned by the SP_HELPINDEX stored procedure. Because calling a stored procedure is different from running a SELECT statement, switching databases may be required prior to calling a stored procedure instead of with the stored procedure. If you're using a USE statement to switch databases, run it separately or follow it with a GO command. For example, USE Clarity_Feb GO SP_HELPINDEX ORDER_MED Performance 8 • 8 RPT101i SQL I 193
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The ORDER_MED table has one row per medication order Running SP_HELPINDEX ORDER_MED yields a result with several indexes. Here are three of the rows: index_name index_description index_keys EIX_ORDER_MED_ORD_I NST nonclustered located on PRIMARY ORDER_INST EIX_ORDER_MED_PAID_ CMP nonclustered located on PRIMARY PAT_ID, PAT_ENC_DATE_REAL PK_ORDER_MED clustered, unique, primary key located on PRIMARY ORDER_MED_ID ORDER_INST, PAT_ID, PAT_ENC_DATE_REAL, and ORDER_MED_ID are all indexed columns. Many of the Caboodle database objects referenced in this training companion are actually views in the FullAccess schema. Many of these views reference tables of the same name in the dbo schema. Because views may not have as many indexes as the underlying tables, running SP_HELPINDEX on the table in the dbo schema may be more useful. Use the syntax SP_HELPINDEX 'Schema.TableName' where Schema is the desired schema. For example, USE Caboodle_Feb GO SP_HELPINDEX 'dbo.AddressDim' ALL_IND_COLUMNS ﴾Oracle﴿ In Oracle, the ALL_IND_COLUMNS table can be used to list the indexes of a table. Run the query SELECT * FROM ALL_IND_COLUMNS WHERE TABLE_NAME = 'table_name' where table_name is the name of the table for which you want to list the indexes. The result of running the query is a table that includes three columns: INDEX_NAME holds the name of the index 8 • 9 Performance 194 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 COLUMN_POSITION holds which position the column is in the index ﴾1, 2, 3, ...﴿ COLUMN_NAME holds the name of the column that is included in the index For the scope of this lesson, what's important is to be able to determine whether or not a column is indexed. A column is indexed if and only if it is listed in the COLUMN_NAME column when querying the ALL_IND_COLUMNS table and filtering appropriately by TABLE_NAME. The ORDER_MED table has one row per medication order The query SELECT * FROM ALL_IND_COLUMNS WHERE TABLE_NAME = 'ORDER_MED' yields a result with several indexes. Here are three of the columns and four of the rows: INDEX_NAME COLUMN_POSITION COLUMN_NAME EIX_ORDER_MED_ORD_I NST 1 ORDER_INST EIX_ORDER_MED_PAID_ CMP 1 PAT_ID EIX_ORDER_MED_PAID_ CMP 2 PAT_ENC_DATE_REAL PK_ORDER_MED 1 ORDER_MED_ID ORDER_INST, PAT_ID, PAT_ENC_DATE_REAL, and ORDER_MED_ID are all indexed columns. Exercise 2: Determine Indexes In this exercise, you'll practice looking up indexes for a table. Background The ROI_AUDIT_TRAIL table has one row per status change to a release of information request Task Determine the indexed columns of the ROI_AUDIT_TRAIL table. Use Indexed Columns Performance 8 • 10 RPT101i SQL I 195
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 Including an indexed column in your query gives SQL a chance to use that index. This is more obviously true for clauses that inherently involve sorting, but it is also true for the other clauses. By including an indexed column in the ORDER BY clause, the database may be spared from directly processing the ORDER BY clause because the result may already be sorted. The HSP_ACCT_CDSTS_HX table has one row per hospital account coding status change The HSP_ACCOUNT_ID column holds the hospital account receivable ID ﴾HAR﴿ The LINE column holds the change number for the hospital account ﴾1, 2, 3, ...﴿ The CDSTS_HX_INST column holds the date and time the change was made The CDSTS_HX_STS_C column holds a value that indicates the coding status You are tasked with writing a query to return the coding status values sorted by HAR. For each HAR, the coding status changes must be in chronological order. Assuming that ordering by LINE... SELECT HSP_ACCOUNT_ID, CDSTS_HX_STS_C FROM HSP_ACCT_CDSTS_HX ORDER BY HSP_ACCOUNT_ID, LINE ...and CDSTS_HX_INST... SELECT HSP_ACCOUNT_ID, CDSTS_HX_STS_C FROM HSP_ACCT_CDSTS_HX ORDER BY HSP_ACCOUNT_ID, CDSTS_HX_INST ...both produce satisfactory results, which one will likely perform better? A check on the indexed columns... SP_HELPINDEX HSP_ACCT_CDSTS_HX ...shows that LINE is an indexed column but CDSTS_HX_INST is not. Therefore, the query with LINE in the ORDER BY clause is more likely to perform better. When the database processes the GROUP BY clause, it may need to sort the data by the grouped column﴾s﴿ to aggregate the rows. By including an indexed column in the GROUP BY clause, the database may use the index in which case the data would already be sorted. By allowing the database to skip the sorting step, the query will likely perform better. 8 • 11 Performance 196 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The CLARITY_DEP table has one row per department The SPECIALTY column holds the name of the specialty of the department The SPECIALTY_DEP_C column holds a value that represents the specialty of the department You are tasked with writing a query that lists how many departments there are of each specialty. Assume that returning the name of the specialty... SELECT SPECIALTY, COUNT( * ) "Count" FROM CLARITY_DEP GROUP BY SPECIALTY ...and the value of the specialty... SELECT SPECIALTY_DEP_C, COUNT( * ) "Count" FROM CLARITY_DEP GROUP BY SPECIALTY_DEP_C ...are both ok. Which query is likely to perform better? A check on the indexed columns... SP_HELPINDEX CLARITY_DEP ...shows that SPECIALTY is not indexed but SPECIALTY_DEP_C is. Therefore, SPECIALTY_DEP_C is the preferred column to use in the GROUP BY clause. When the database encounters a logical expression in a query, it needs to search for values in the specified columns. Searching is faster if the values are sorted, so searching is faster if the columns are indexed. Therefore, using indexed columns in logical expressions may improve the performance of the query. Performance 8 • 12 RPT101i SQL I 197
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The ED_IEV_PAT_INFO table has one row per ED event Both the PAT_ENC_CSN_ID and the PAT_CSN columns hold the contact serial number ﴾CSN﴿ of the patient encounter These are foreign keys to PAT_ENC The PAT_ENC table has one row per encounter The PAT_ENC_CSN_ID is the primary key You've been tasked with writing a query that includes both encounter and ED event information. Assume that linking on PAT_ENC_CSN_ID... FROM ED_IEV_PAT_INFO eipi INNER JOIN PAT_ENC ON eipi.PAT_ENC_CSN_ID = pe.PAT_ENC_CSN_ID ...and PAT_CSN... FROM ED_IEV_PAT_INFO eipi INNER JOIN PAT_ENC ON eipi.PAT_CSN = pe.PAT_ENC_CSN_ID ...yield the same result. Which column is preferred in the join condition? A check on the indexed columns... SP_HELPINDEX ED_IEV_PAT_INFO ...shows that PAT_CSN is indexed but PAT_ENC_CSN_ID is not. Therefore, PAT_CSN is the preferred column to use in the join condition. 8 • 13 Performance 198 RPT101i SQL I
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The PAT_ENC table has one row per patient encounter The CONTACT_DATE column just holds the encounter date The CHECKIN_TIME column holds the encounter date with the time the patient was checked in The CHECKOUT_TIME column holds the encounter date with the time the patient was checked out You are tasked with writing a query to return information about certain encounters within a specified date range. Assuming that filtering on any of the columns produces satisfactory results, which column is preferred in the WHERE clause? A check on the indexed columns... SP_HELPINDEX PAT_ENC ...shows that CONTACT_DATE is indexed but the other two columns are not. Therefore CONTACT_DATE is the preferred column to use in the WHERE clause. For a statement such as... WHERE '12/1/2016' <= CONTACT_DATE AND CONTACT_DATE < '12/8/2016' ...the database may be able to use the index to quickly find the cutoff of all encounters before 12/1/2016 and the cut off of all encounters on or after 12/8/2016, and then return all the encounters in the middle. This might be more efficient than if the database had to check every row in PAT_ENC to see if it fell in the date range. In cases where only indexed columns are used elsewhere in the query, even using indexed columns in the SELECT clause may be more efficient. This is more useful in subqueries, covered in the RPT121i SQL II training companion. Performance 8 • 14 RPT101i SQL I 199
EpicUUID: 5143F74D-DB01-4131-AAA3-52F4753E50D4 The CLARITY_DEP table has one row per department The SPECIALTY column holds the name of the specialty of the department The SPECIALTY_DEP_C column holds a value that represents the specialty of the department You are tasked with writing a query that lists how many departments there are of each specialty. Assume that returning the name of the specialty... SELECT MAX( SPECIALTY ) "Specialty", COUNT( * ) "Count" FROM CLARITY_DEP GROUP BY SPECIALTY_DEP_C ...and the value of the specialty... SELECT SPECIALTY_DEP_C, COUNT(*) "Count" FROM CLARITY_DEP GROUP BY SPECIALTY_DEP_C ...are both ok. Which query is likely to perform better? A check on the indexed columns... SP_HELPINDEX CLARITY_DEP ...shows that SPECIALTY is not indexed but SPECIALTY_DEP_C is. Therefore, SPECIALTY_DEP_C is the preferred column to use in the SELECT clause. It's possible that the database will be able to return the query result just by looking at the index and not the data in the table. Avoid Functions Just as SQL is a so‐called "black box"... You put stuff ﴾a query﴿ in Stuff ﴾a query result﴿ comes out You may not know exactly what happens in between ...SQL functions can be little black boxes: You put stuff ﴾arguments﴿ in Stuff ﴾a value﴿ comes out You may not know exactly what happens in between In particular, a function with an indexed column as an argument may strip the database of the ability to effectively use that index. Therefore, avoid the use of functions on indexed columns if you intend for the database to use that index. 8 • 15 Performance 200 RPT101i SQL I