O F F I C I A L
M I C R O S O F T
6232B
L E A R N I N G
P R O D U C T
Implementing a Microsoft® SQL Server® 2008 R2 Database
Volume 1
ii
Implementing a Microsoft® SQL Server® 2008 R2 Database
Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The names of manufacturers, products, or URLs are provided for informational purposes only and Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not responsible for the contents of any linked site or any link contained in a linked site, or any changes or updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission received from any linked site. Microsoft is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement of Microsoft of the site or the products contained therein. © 2011 Microsoft Corporation. All rights reserved. Microsoft, and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.
Product Number: 6232B Part Number: X17-52339 Released: 03/2011
MICROSOFT LICENSE TERMS OFFICIAL MICROSOFT LEARNING PRODUCTS - TRAINER EDITION – Pre-Release and Final Release Versions These license terms are an agreement between Microsoft Corporation and you. Please read them. They apply to the Licensed Content named above, which includes the media on which you received it, if any. The terms also apply to any Microsoft
updates,
supplements,
Internet-based services, and
support services
for this Licensed Content, unless other terms accompany those items. If so, those terms apply. By using the Licensed Content, you accept these terms. If you do not accept them, do not use the Licensed Content. If you comply with these license terms, you have the rights below.
1. DEFINITIONS. a. “Academic Materials” means the printed or electronic documentation such as manuals, workbooks, white papers, press releases, datasheets, and FAQs which may be included in the Licensed Content. b. “Authorized Learning Center(s)” means a Microsoft Certified Partner for Learning Solutions location, an IT Academy location, or such other entity as Microsoft may designate from time to time. c. “Authorized Training Session(s)” means those training sessions authorized by Microsoft and conducted at or through Authorized Learning Centers by a Trainer providing training to Students solely on Official Microsoft Learning Products (formerly known as Microsoft Official Curriculum or “MOC”) and Microsoft Dynamics Learning Products (formerly know as Microsoft Business Solutions Courseware). Each Authorized Training Session will provide training on the subject matter of one (1) Course. d. “Course” means one of the courses using Licensed Content offered by an Authorized Learning Center during an Authorized Training Session, each of which provides training on a particular Microsoft technology subject matter. e. “Device(s)” means a single computer, device, workstation, terminal, or other digital electronic or analog device. f.
“Licensed Content” means the materials accompanying these license terms. The Licensed Content may include, but is not limited to, the following elements: (i) Trainer Content, (ii) Student Content, (iii) classroom setup guide, and (iv) Software. There are different and separate components of the Licensed Content for each Course.
g.
“Software” means the Virtual Machines and Virtual Hard Disks, or other software applications that may be included with the Licensed Content.
h. “Student(s)” means a student duly enrolled for an Authorized Training Session at your location. i.
“Student Content” means the learning materials accompanying these license terms that are for use by Students and Trainers during an Authorized Training Session. Student Content may include labs, simulations, and courseware files for a Course.
j.
“Trainer(s)” means a) a person who is duly certified by Microsoft as a Microsoft Certified Trainer and b) such other individual as authorized in writing by Microsoft and has been engaged by an Authorized Learning Center to teach or instruct an Authorized Training Session to Students on its behalf.
k. “Trainer Content” means the materials accompanying these license terms that are for use by Trainers and Students, as applicable, solely during an Authorized Training Session. Trainer Content may include Virtual Machines, Virtual Hard Disks, Microsoft PowerPoint files, instructor notes, and demonstration guides and script files for a Course. l.
“Virtual Hard Disks” means Microsoft Software that is comprised of virtualized hard disks (such as a base virtual hard disk or differencing disks) for a Virtual Machine that can be loaded onto a single computer or other device in order to allow end-users to run multiple operating systems concurrently. For the purposes of these license terms, Virtual Hard Disks will be considered “Trainer Content”.
m. “Virtual Machine” means a virtualized computing experience, created and accessed using Microsoft Virtual PC or Microsoft Virtual Server software that consists of a virtualized hardware environment, one or more Virtual Hard Disks,
and a configuration file setting the parameters of the virtualized hardware environment (e.g., RAM). For the purposes of these license terms, Virtual Hard Disks will be considered “Trainer Content”.
n.
“you” means the Authorized Learning Center or Trainer, as applicable, that has agreed to these license terms.
2. OVERVIEW. Licensed Content. The Licensed Content includes Software, Academic Materials (online and electronic), Trainer Content, Student Content, classroom setup guide, and associated media. License Model. The Licensed Content is licensed on a per copy per Authorized Learning Center location or per Trainer basis.
3. INSTALLATION AND USE RIGHTS. a. Authorized Learning Centers and Trainers: For each Authorized Training Session, you may: i.
either install individual copies of the relevant Licensed Content on classroom Devices only for use by Students enrolled in and the Trainer delivering the Authorized Training Session, provided that the number of copies in use does not exceed the number of Students enrolled in and the Trainer delivering the Authorized Training Session, OR
ii. install one copy of the relevant Licensed Content on a network server only for access by classroom Devices and only for use by Students enrolled in and the Trainer delivering the Authorized Training Session, provided that the number of Devices accessing the Licensed Content on such server does not exceed the number of Students enrolled in and the Trainer delivering the Authorized Training Session. iii. and allow the Students enrolled in and the Trainer delivering the Authorized Training Session to use the Licensed Content that you install in accordance with (ii) or (ii) above during such Authorized Training Session in accordance with these license terms. i.
Separation of Components. The components of the Licensed Content are licensed as a single unit. You may not separate the components and install them on different Devices.
ii. Third Party Programs. The Licensed Content may contain third party programs. These license terms will apply to the use of those third party programs, unless other terms accompany those programs.
b. Trainers: i.
Trainers may Use the Licensed Content that you install or that is installed by an Authorized Learning Center on a classroom Device to deliver an Authorized Training Session.
ii. Trainers may also Use a copy of the Licensed Content as follows:
A. Licensed Device. The licensed Device is the Device on which you Use the Licensed Content. You may install and Use one copy of the Licensed Content on the licensed Device solely for your own personal training Use and for preparation of an Authorized Training Session. B. Portable Device. You may install another copy on a portable device solely for your own personal training Use and for preparation of an Authorized Training Session. 4. PRE-RELEASE VERSIONS. If this is a pre-release (“beta”) version, in addition to the other provisions in this agreement, these terms also apply: a. Pre-Release Licensed Content. This Licensed Content is a pre-release version. It may not contain the same information and/or work the way a final version of the Licensed Content will. We may change it for the final, commercial version. We also may not release a commercial version. You will clearly and conspicuously inform any Students who participate in each Authorized Training Session of the foregoing; and, that you or Microsoft are under no obligation to provide them with any further content, including but not limited to the final released version of the Licensed Content for the Course. b. Feedback. If you agree to give feedback about the Licensed Content to Microsoft, you give to Microsoft, without charge, the right to use, share and commercialize your feedback in any way and for any purpose. You also give to third parties, without charge, any patent rights needed for their products, technologies and services to use or interface with any specific parts of a Microsoft software, Licensed Content, or service that includes the feedback. You will not give feedback that is subject to a license that requires Microsoft to license its software or documentation to third parties because we include your feedback in them. These rights survive this agreement. c. Confidential Information. The Licensed Content, including any viewer, user interface, features and documentation that may be included with the Licensed Content, is confidential and proprietary to Microsoft and its suppliers.
i.
Use. For five years after installation of the Licensed Content or its commercial release, whichever is first, you may not disclose confidential information to third parties. You may disclose confidential information only to your employees and consultants who need to know the information. You must have written agreements with them that protect the confidential information at least as much as this agreement.
ii.
Survival. Your duty to protect confidential information survives this agreement.
iii. Exclusions. You may disclose confidential information in response to a judicial or governmental order. You must first give written notice to Microsoft to allow it to seek a protective order or otherwise protect the information. Confidential information does not include information that
becomes publicly known through no wrongful act;
you received from a third party who did not breach confidentiality obligations to Microsoft or its suppliers; or
you developed independently.
d.
Term. The term of this agreement for pre-release versions is (i) the date which Microsoft informs you is the end date for using the beta version, or (ii) the commercial release of the final release version of the Licensed Content, whichever is first (“beta term”).
e.
Use. You will cease using all copies of the beta version upon expiration or termination of the beta term, and will destroy all copies of same in the possession or under your control and/or in the possession or under the control of any Trainers who have received copies of the pre-released version.
f.
Copies. Microsoft will inform Authorized Learning Centers if they may make copies of the beta version (in either print and/or CD version) and distribute such copies to Students and/or Trainers. If Microsoft allows such distribution, you will follow any additional terms that Microsoft provides to you for such copies and distribution.
5. ADDITIONAL LICENSING REQUIREMENTS AND/OR USE RIGHTS. a. Authorized Learning Centers and Trainers: i.
Software.
ii. Virtual Hard Disks. The Licensed Content may contain versions of Microsoft XP, Microsoft Windows Vista, Windows Server 2003, Windows Server 2008, and Windows 2000 Advanced Server and/or other Microsoft products which are provided in Virtual Hard Disks. A. If the Virtual Hard Disks and the labs are launched through the Microsoft Learning Lab Launcher, then these terms apply: Time-Sensitive Software. If the Software is not reset, it will stop running based upon the time indicated on the install of the Virtual Machines (between 30 and 500 days after you install it). You will not receive notice before it stops running. You may not be able to access data used or information saved with the Virtual Machines when it stops running and may be forced to reset these Virtual Machines to their original state. You must remove the Software from the Devices at the end of each Authorized Training Session and reinstall and launch it prior to the beginning of the next Authorized Training Session. B. If the Virtual Hard Disks require a product key to launch, then these terms apply: Microsoft will deactivate the operating system associated with each Virtual Hard Disk. Before installing any Virtual Hard Disks on classroom Devices for use during an Authorized Training Session, you will obtain from Microsoft a product key for the operating system software for the Virtual Hard Disks and will activate such Software with Microsoft using such product key. C. These terms apply to all Virtual Machines and Virtual Hard Disks: You may only use the Virtual Machines and Virtual Hard Disks if you comply with the terms and conditions of this agreement and the following security requirements: o
You may not install Virtual Machines and Virtual Hard Disks on portable Devices or Devices that are accessible to other networks.
o
You must remove Virtual Machines and Virtual Hard Disks from all classroom Devices at the end of each Authorized Training Session, except those held at Microsoft Certified Partners for Learning Solutions locations.
o
You must remove the differencing drive portions of the Virtual Hard Disks from all classroom Devices at the end of each Authorized Training Session at Microsoft Certified Partners for Learning Solutions locations.
o
You will ensure that the Virtual Machines and Virtual Hard Disks are not copied or downloaded from Devices on which you installed them.
o
You will strictly comply with all Microsoft instructions relating to installation, use, activation and deactivation, and security of Virtual Machines and Virtual Hard Disks.
o
You may not modify the Virtual Machines and Virtual Hard Disks or any contents thereof.
o
You may not reproduce or redistribute the Virtual Machines or Virtual Hard Disks.
ii. Classroom Setup Guide. You will assure any Licensed Content installed for use during an Authorized Training Session will be done in accordance with the classroom set-up guide for the Course. iii. Media Elements and Templates. You may allow Trainers and Students to use images, clip art, animations, sounds, music, shapes, video clips and templates provided with the Licensed Content solely in an Authorized Training Session. If Trainers have their own copy of the Licensed Content, they may use Media Elements for their personal training use. iv. iv Evaluation Software. Any Software that is included in the Student Content designated as “Evaluation Software” may be used by Students solely for their personal training outside of the Authorized Training Session.
b. Trainers Only: i.
Use of PowerPoint Slide Deck Templates . The Trainer Content may include Microsoft PowerPoint slide decks. Trainers may use, copy and modify the PowerPoint slide decks only for providing an Authorized Training Session. If you elect to exercise the foregoing, you will agree or ensure Trainer agrees: (a) that modification of the slide decks will not constitute creation of obscene or scandalous works, as defined by federal law at the time the work is created; and (b) to comply with all other terms and conditions of this agreement.
ii. Use of Instructional Components in Trainer Content. For each Authorized Training Session, Trainers may customize and reproduce, in accordance with the MCT Agreement, those portions of the Licensed Content that are logically associated with instruction of the Authorized Training Session. If you elect to exercise the foregoing rights, you agree or ensure the Trainer agrees: (a) that any of these customizations or reproductions will only be used for providing an Authorized Training Session and (b) to comply with all other terms and conditions of this agreement. iii. Academic Materials. If the Licensed Content contains Academic Materials, you may copy and use the Academic Materials. You may not make any modifications to the Academic Materials and you may not print any book (either electronic or print version) in its entirety. If you reproduce any Academic Materials, you agree that:
The use of the Academic Materials will be only for your personal reference or training use
You will not republish or post the Academic Materials on any network computer or broadcast in any media;
You will include the Academic Material’s original copyright notice, or a copyright notice to Microsoft’s benefit in the format provided below: Form of Notice: © 2010 Reprinted for personal reference use only with permission by Microsoft Corporation. All rights reserved. Microsoft, Windows, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the US and/or other countries. Other product and company names mentioned herein may be the trademarks of their respective owners.
6. INTERNET-BASED SERVICES. Microsoft may provide Internet-based services with the Licensed Content. It may change or cancel them at any time. You may not use these services in any way that could harm them or impair anyone else’s use of them. You may not use the services to try to gain unauthorized access to any service, data, account or network by any means. 7. SCOPE OF LICENSE. The Licensed Content is licensed, not sold. This agreement only gives you some rights to use the Licensed Content. Microsoft reserves all other rights. Unless applicable law gives you more rights despite this limitation, you may use the Licensed Content only as expressly permitted in this agreement. In doing so, you must comply with any technical limitations in the Licensed Content that only allow you to use it in certain ways. You may not
install more copies of the Licensed Content on classroom Devices than the number of Students and the Trainer in the Authorized Training Session;
allow more classroom Devices to access the server than the number of Students enrolled in and the Trainer delivering the Authorized Training Session if the Licensed Content is installed on a network server;
copy or reproduce the Licensed Content to any server or location for further reproduction or distribution;
disclose the results of any benchmark tests of the Licensed Content to any third party without Microsoft’s prior written approval;
work around any technical limitations in the Licensed Content;
reverse engineer, decompile or disassemble the Licensed Content, except and only to the extent that applicable law expressly permits, despite this limitation;
make more copies of the Licensed Content than specified in this agreement or allowed by applicable law, despite this limitation;
publish the Licensed Content for others to copy;
transfer the Licensed Content, in whole or in part, to a third party;
access or use any Licensed Content for which you (i) are not providing a Course and/or (ii) have not been authorized by Microsoft to access and use;
rent, lease or lend the Licensed Content; or
use the Licensed Content for commercial hosting services or general business purposes.
Rights to access the server software that may be included with the Licensed Content, including the Virtual Hard Disks does not give you any right to implement Microsoft patents or other Microsoft intellectual property in software or devices that may access the server.
8. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regulations. You must comply with all domestic and international export laws and regulations that apply to the Licensed Content. These laws include restrictions on destinations, end users and end use. For additional information, see www.microsoft.com/exporting. 9. NOT FOR RESALE SOFTWARE/LICENSED CONTENT. You may not sell software or Licensed Content marked as “NFR” or “Not for Resale.” 10. ACADEMIC EDITION. You must be a “Qualified Educational User” to use Licensed Content marked as “Academic Edition” or “AE.” If you do not know whether you are a Qualified Educational User, visit www.microsoft.com/education or contact the Microsoft affiliate serving your country. 11. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you fail to comply with the terms and conditions of these license terms. In the event your status as an Authorized Learning Center or Trainer a) expires, b) is voluntarily terminated by you, and/or c) is terminated by Microsoft, this agreement shall automatically terminate. Upon any termination of this agreement, you must destroy all copies of the Licensed Content and all of its component parts. 12. ENTIRE AGREEMENT. This agreement, and the terms for supplements, updates, Internet-based services and support services that you use, are the entire agreement for the Licensed Content and support services. 13. APPLICABLE LAW. a. United States. If you acquired the Licensed Content in the United States, Washington state law governs the interpretation of this agreement and applies to claims for breach of it, regardless of conflict of laws principles. The laws of the state where you live govern all other claims, including claims under state consumer protection laws, unfair competition laws, and in tort. b. Outside the United States. If you acquired the Licensed Content in any other country, the laws of that country apply. 14. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the laws of your country. You may also have rights with respect to the party from whom you acquired the Licensed Content. This agreement does not change your rights under the laws of your country if the laws of your country do not permit it to do so.
15. DISCLAIMER OF WARRANTY. The Licensed Content is licensed “as-is.” You bear the risk of using it. Microsoft gives no express warranties, guarantees or conditions. You may have additional consumer rights under your local laws which this agreement cannot change. To the extent permitted under your local laws, Microsoft excludes the implied warranties of merchantability, fitness for a particular purpose and noninfringement. 16. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM MICROSOFT AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO U.S. $5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES. This limitation applies to
anything related to the Licensed Content, software, services, content (including code) on third party Internet sites, or third party programs; and
claims for breach of contract, breach of warranty, guarantee or condition, strict liability, negligence, or other tort to the extent permitted by applicable law.
It also applies even if Microsoft knew or should have known about the possibility of the damages. The above limitation or exclusion may not apply to you because your country may not allow the exclusion or limitation of incidental, consequential or other damages. Please note: As this Licensed Content is distributed in Quebec, Canada, some of the clauses in this agreement are provided below in French. Remarque : Ce le contenu sous licence étant distribué au Québec, Canada, certaines des clauses dans ce contrat sont fournies ci-dessous en français. EXONÉRATION DE GARANTIE. Le contenu sous licence visé par une licence est offert « tel quel ». Toute utilisation de ce contenu sous licence est à votre seule risque et péril. Microsoft n’accorde aucune autre garantie expresse. Vous pouvez bénéficier de droits additionnels en vertu du droit local sur la protection dues consommateurs, que ce contrat ne peut modifier. La ou elles sont permises par le droit locale, les garanties implicites de qualité marchande, d’adéquation à un usage particulier et d’absence de contrefaçon sont exclues. LIMITATION DES DOMMAGES-INTÉRÊTS ET EXCLUSION DE RESPONSABILITÉ POUR LES DOMMAGES. Vous pouvez obtenir de Microsoft et de ses fournisseurs une indemnisation en cas de dommages directs uniquement à hauteur de 5,00 $ US. Vous ne pouvez prétendre à aucune indemnisation pour les autres dommages, y compris les dommages spéciaux, indirects ou accessoires et pertes de bénéfices. Cette limitation concerne:
tout ce qui est relié au le contenu sous licence , aux services ou au contenu (y compris le code) figurant sur des sites Internet tiers ou dans des programmes tiers ; et
les réclamations au titre de violation de contrat ou de garantie, ou au titre de responsabilité stricte, de négligence ou d’une autre faute dans la limite autorisée par la loi en vigueur.
Elle s’applique également, même si Microsoft connaissait ou devrait connaître l’éventualité d’un tel dommage. Si votre pays n’autorise pas l’exclusion ou la limitation de responsabilité pour les dommages indirects, accessoires ou de quelque nature que ce soit, il se peut que la limitation ou l’exclusion ci-dessus ne s’appliquera pas à votre égard. EFFET JURIDIQUE. Le présent contrat décrit certains droits juridiques. Vous pourriez avoir d’autres droits prévus par les lois de votre pays. Le présent contrat ne modifie pas les droits que vous confèrent les lois de votre pays si celles-ci ne le permettent pas.
Implementing a Microsoft® SQL Server® 2008 R2 Database
ix
x
Implementing a Microsoft® SQL Server® 2008 R2 Database
Acknowledgements Microsoft Learning would like to acknowledge and thank the following for their contribution towards developing this title. Their effort at various stages in the development has ensured that you have a good classroom experience.
Greg Low – Lead Developer Dr Greg Low is a SQL Server MVP, an MCT, and a Microsoft Regional Director for Australia. Greg has worked with SQL Server since version 4.2 as an active mentor, consultant, and trainer. He has been an instructor in the Microsoft SQL Server Masters certification program for several years and was one of the first two people to achieve the SQL Server 2008 Master certification. Greg is best known for his SQL Down Under podcast (at www.sqldownunder.com) where he interviews SQL Server MVPs and product team members on topics of interest to the SQL Server community. He is the CEO of SolidQ Australia which is part of Solid Quality Mentors. He is the author of a number whitepapers on the Microsoft MSDN and TechNet web sites and a number of SQL Server related books.
Herbert Albert – SolidQ Technical Reviewer Herbert Albert started his career in 1994. He works as a trainer, consultant, and author focusing on SQL Server technologies. Herbert is a mentor and Managing Director of Solid Quality Mentors Central Europe located in Vienna, Austria. He has several Microsoft certifications including being an MCT since 1997. He is a regular speaker at conferences and is a co-author of the SQL Server 2008 R2 Upgrade Technical Reference Guide and SQL Server 2005 Step-by-Step Applied Techniques. Together with Gianluca Hotz, Herbert writes a regular column at the SolidQ Journal.
Chris Barker – Technical Reviewer Chris Barker is an MCT in New Zealand and currently employed as a staff trainer at Auldhouse, one of New Zealand’s major CPLS training centers in Wellington. He has been programming from the early 1970s—his first program was written in assembly language and debugged in binary (literally)! While focusing training on programming (mostly .NET) and databases (mostly Microsoft SQL Server), Chris has also been an infrastructure trainer and has both Novell and Microsoft networking qualifications.
Mark Hions – Technical Reviewer Mark's passion for computing and skill as a communicator were well suited to his position as an instructor at Honeywell Canada, where he started working with minicomputers, mainframes, and mature students in 1984. He first met Microsoft SQL Server when it ran on OS/2, and has delivered training on every version since. An independent MCT and consultant for many years, he is a highly-rated presenter at TechEd, has designed SQL Server exams for Microsoft, and has delivered deep-dive courses through the Microsoft Partner Channel. Mark is now the Principal SQL Server Instructor and Consultant at DesTech, which is the largest provider of SQL Server training in the Toronto area.
Implementing a Microsoft® SQL Server® 2008 R2 Database
Contents Module 1: Introduction to SQL Server 2008 R2 and its Toolset Lesson 1: Introduction to the SQL Server Platform
1-3
Lesson 2: Working with SQL Server Tools
1-14
Lesson 3: Configuring SQL Server Services
1-28
Lab 1: Introduction to SQL Server and its Toolset
1-36
Module 2: Working with Data Types Lesson 1: Using Data Types
2-3
Lesson 2: Working with Character Data
2-19
Lesson 3: Converting Data Types
2-27
Lesson 4: Specialized Data Types
2-34
Lab 2: Working with Data Types
2-40
Module 3: Designing and Implementing Tables Lesson 1: Designing Tables
3-3
Lesson 2: Working with Schemas
3-15
Lesson 3: Creating and Altering Tables
3-21
Lab 3: Designing and Implementing Tables
3-32
Module 4: Designing and Implementing Views Lesson 1: Introduction to Views
4-3
Lesson 2: Creating and Managing Views
4-13
Lesson 3: Performance Considerations for Views
4-22
Lab 4: Designing and Implementing Views
4-27
Module 5: Planning for SQL Server 2008 R2 Indexing Lesson 1: Core Indexing Concepts
5-3
Lesson 2: Data Types and Indexes
5-11
Lesson 3: Single Column and Composite Indexes
5-19
Lab 5: Planning for SQL Server Indexing
5-24
Module 6: Implementing Table Structures in SQL Server 2008 R2 Lesson 1: SQL Server Table Structures
6-3
Lesson 2: Working with Clustered Indexes
6-13
Lesson 3: Designing Effective Clustered Indexes
6-20
Lab 6: Implementing Table Structures in SQL Server
6-26
xi
xii
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module 7: Reading SQL Server 2008 R2 Execution Plans Lesson 1: Reading SQL Server Execution Plans
7-3
Lesson 2: Common Execution Plan Elements
7-14
Lesson 3: Working with Execution Plans
7-24
Lab 7: Reading SQL Server Execution Plans
7-31
Module 8: Improving Performance through Nonclustered Indexes Lesson 1: Designing Effective Nonclustered Indexes
8-3
Lesson 2: Implementing Nonclustered Indexes
8-10
Lesson 3: Using the Database Engine Tuning Advisor
8-18
Lab 8: Improving Performance through Nonclustered Indexes
8-25
Module 9: Designing and Implementing Stored Procedures Lesson 1: Introduction to Stored Procedures
9-3
Lesson 2: Working With Stored Procedures
9-11
Lesson 3: Implementing Parameterized Stored Procedures
9-23
Lesson 4: Controlling Execution Context
9-33
Lab 9: Designing and Implementing Stored Procedures
9-39
Module 10: Merging Data and Passing Tables Lesson 1: Using the MERGE Statement Lesson 2: Implementing Table Types
10-3 10-14
Lesson 3: Using TABLE Types As Parameters
10-22
Lab 10: Passing Tables and Merging Data
10-26
Module 11: Creating Highly Concurrent SQL Server 2008 R2 Applications Lesson 1: Introduction to Transactions
11-3
Lesson 2: Introduction to Locks
11-17
Lesson 3: Management of Locking
11-28
Lesson 4: Transaction Isolation Levels
11-38
Lab 11: Creating Highly Concurrent SQL Server Applications
11-44
Module 12: Handling Errors in T-SQL Code Lesson 1: Designing T-SQL Error Handling
12-3
Lesson 2: Implementing T-SQL Error Handling
12-13
Lesson 3: Implementing Structured Exception Handling
12-23
Lab 12: Handling Errors in T-SQL Code
12-31
About This Course
xiii
About This Course This section provides you with a brief description of the course, audience, suggested prerequisites, and course objectives.
Course Description This five-day instructor-led course is intended for Microsoft® SQL Server database developers who are responsible for implementing a database on SQL Server 2008 R2. In this course, students learn the skills and best practices on how to use SQL Server 2008 R2 product features and tools related to implementing a database server.
Audience This course is intended for IT Professionals who want to become skilled on SQL Server 2008 R2 product features and technologies for implementing a database. To be successful in this course, the student should have knowledge of basic relational database concepts and writing T-SQL queries.
Student Prerequisites This course requires that you meet the following prerequisites: •
Working knowledge of Transact-SQL (ability to write Transact-SQL queries)
•
Working knowledge of relational databases (database design skills)
•
Core Windows Server skills
•
Completed Course 2778: Writing Queries Using Microsoft SQL Server 2008 Transact-SQL
Course Objectives After completing this course, students will be able to: •
Understand the product, its components, and basic configuration
•
Work with the data types supported by SQL Server
•
Design and implement tables and work with schemas
•
Design and implement views and partitioned views
•
Describe the concept of an index and determine the appropriate data type for indexes and composite index structures
•
Identify the appropriate table structures and implement clustered indexes and heaps
•
Describe and capture execution plans
•
Design and implement non-clustered indexes, covering indexes, and included columns
•
Design and implement stored procedures
•
Implement table types, table valued parameters, and the MERGE statement
•
Describe transactions, transaction isolation levels, and application design patterns for highlyconcurrent applications
•
Design and implement T-SQL error handling and structured exception handling
•
Design and implement scalar and table-valued functions
•
Design and implement constraints
xiv
About This Course
•
Design and implement triggers
•
Describe and implement target use cases of SQL CLR integration
•
Describe and implement XML data and schema in SQL Server
•
Use FOR XML and XPath queries
•
Describe and use spatial data types in SQL Server
•
Implement and query full-text indexes
Course Outline This section provides an outline of the course: Module 1, “Introduction to SQL Server 2008 R2 and its Toolset” introduces you to the entire SQL Server platform and its major tools. This module also covers editions, versions, basics of network listeners, and concepts of services and service accounts. Module 2, “Working with Data Types” describes the data types supported by SQL Server and how to work with them. Module 3, “Designing and Implementing Tables” describes the design and implementation of tables. Module 4, “Designing and Implementing Views” describes the design and implementation of views. Module 5, “Planning for SQL Server 2008 R2 Indexing” describes the concept of an index and discusses selectivity, density, and statistics. This module also covers appropriate data type choices and choices around composite index structures. Module 6, “Implementing Table Structures in SQL Server 2008 R2” covers clustered indexes and heaps. Module 7, “Reading SQL Server 2008 R2 Execution Plans” introduces the concept of reading execution plans. Module 8, “Improving Performance through Nonclustered Indexes” covers non-clustered indexes, covering indexes and included columns. Module 9, “Designing and Implementing Stored Procedures” describes the design and implementation of stored procedures. Module 10, “Merging Data and Passing Tables” covers table types, table valued parameters and the MERGE statement as used in stored procedures. Module 11, “Creating Highly Concurrent SQL Server 2008 R2 Applications” covers transactions, isolation levels, and designing for concurrency. Module 12, “Handling Errors in T-SQL Code” describes structured exception handling and gives solid examples of its use within the design of stored procedures. Module 13, “Designing and Implementing User-Defined Functions” describes the design and implementation of functions, both scalar and table-valued. Module 14, “Ensuring Data Integrity through Constraints” describes the design and implementation of constraints. Module 15, “Responding to Data Manipulation via Triggers” describes the design and implementation of triggers. Module 16, “Implementing Managed Code in SQL Server 2008 R2” describes the implementation of and target use-cases for SQL CLR integration.
About This Course
Module 17, “Storing XML Data in SQL Server 2008 R2” covers the XML data type, schema collections, typed and untyped columns and appropriate use cases for XML in SQL Server. Module 18, “Querying XML Data in SQL Server 2008 R2” covers the basics of FOR XML and XPath Queries. Module 19, “Working with SQL Server 2008 R2 Spatial Data” describes spatial data and how this data can be implemented within SQL Server. Module 20, “Working with Full-Text Indexes and Queries” covers full text indexes and queries.
xv
xvi
About This Course
Course Materials The following materials are included with your kit: •
Course Handbook A succinct classroom learning guide that provides all the critical technical information in a crisp, tightly-focused format, which is just right for an effective in-class learning experience. •
Lessons: Guide you through the learning objectives and provide the key points that are critical to the success of the in-class learning experience.
•
Labs: Provide a real-world, hands-on platform for you to apply the knowledge and skills learned in the module.
•
Module Reviews and Takeaways: Provide improved on-the-job reference material to boost knowledge and skills retention.
•
Lab Answer Keys: Provide step-by-step lab solution guidance at your finger tips when it’s needed.
Course Companion Content on the http://www.microsoft.com/learning/companionmoc/ Site: Searchable, easy-to-navigate digital content with integrated premium on-line resources designed to supplement the Course Handbook. •
•
Modules: Include companion content, such as questions and answers, detailed demo steps and additional reading links, for each lesson. Additionally, they include Lab Review questions and answers and Module Reviews and Takeaways sections, which contain the review questions and answers, best practices, common issues and troubleshooting tips with answers, and real-world issues and scenarios with answers. Resources: Include well-categorized additional resources that give you immediate access to the most up-to-date premium content on TechNet, MSDN®, Microsoft Press®
Student Course files on the http://www.microsoft.com/learning/companionmoc/ Site: Includes the Allfiles.exe, a self-extracting executable file that contains all the files required for the labs and demonstrations. •
Course evaluation At the end of the course, you will have the opportunity to complete an online evaluation to provide feedback on the course, training facility, and instructor. •
To provide additional comments or feedback on the course, send e-mail to
[email protected]. To inquire about the Microsoft Certification Program, send e-mail to
[email protected].
About This Course
xvii
Virtual Machine Environment This section provides the information for setting up the classroom environment to support the business scenario of the course.
Virtual Machine Configuration In this course, you will use Microsoft Virtual Server 2005 R2 with SP1 to perform the labs. The following table shows the role of each virtual machine used in this course: Virtual machine
Role
623XB-MIA-DC
Domain Controller
623XB-MIA-SQL1
SQL Server VM for Module 1 only
623XB-MIA-SQL
SQL Server VM for Modules 2 - 20
Software Configuration The following software is installed on each VM: •
SQL Server 2008 R2 (on the SQL Server VMs)
Course Files There are files associated with the labs in this course. The lab files are located in the folder D:\6232B_Labs on the student computers.
Classroom Setup Each classroom computer will have the same virtual machine configured in the same way.
Course Hardware Level To ensure a satisfactory student experience, Microsoft Learning requires a minimum equipment configuration for trainer and student computers in all Microsoft Certified Partner for Learning Solutions (CPLS) classrooms in which Official Microsoft Learning Product courseware are taught.
xviii
About This Course
Introduction to SQL Server 2008 R2 and its Toolset
Module 1 Introduction to SQL Server 2008 R2 and its Toolset Contents: Lesson 1: Introduction to the SQL Server Platform
1-3
Lesson 2: Working with SQL Server Tools
1-14
Lesson 3: Configuring SQL Server Services
1-28
Lab 1: Introduction to SQL Server and its Toolset
1-36
1-1
1-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
Before beginning to work with SQL Server in either a development or an administration role, it is important to understand the overall SQL Server platform. In particular, it is useful to understand that SQL Server is not just a database engine but it is a complete platform for managing enterprise data. Along with a strong platform, SQL Server provides a series of tools that make the product easy to manage and good target for the application development. Individual components of SQL Server can operate within separate security contexts. Correctly configuring SQL Server services is important where enterprises are operating with a policy of least possible permissions.
Objectives After completing this lesson, you will be able to: • • •
Describe the SQL Server Platform Work with SQL Server Tools Configure SQL Server Services
Introduction to SQL Server 2008 R2 and its Toolset
1-3
Lesson 1
Introduction to the SQL Server Platform
SQL Server is a platform for developing business applications that are data focused. Rather than being a single monolithic application, SQL Server is structured as a series of components. It is important to understand the use of each of the components. More than a single copy of SQL Server can be installed on a server. Each of these copies is called an instance and can be separately configured and managed. SQL Server is shipped in a variety of editions, each with a different set of capabilities. It is important to understand the target business cases for each of the SQL Server editions and how SQL Server has evolved through a series of improving versions over many years. It is a stable and robust platform.
Objectives After completing this lesson, you will be able to: • • • • •
Describe the overall SQL Server platform Explain the role of each of the components that make up the SQL Server platform Describe the functionality provided by SQL Server Instances Explain the available SQL Server Editions Explain how SQL Server has evolved through a series of versions
1-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Architecture
Key Points SQL Server is an integrated and enterprise-ready platform for data management that offers a low total cost of ownership. Question: Which other database platforms have you worked with?
Enterprise Ready While SQL Server is much more than a relational database management system, it provides a very secure, robust, and stable relational database management system. SQL Server is used to manage organizational data and to provide analysis and insights into that data. The database engine is one of the highest performing database engines available and regularly features in the top of industry performance benchmarks. You can review industry benchmarks and scores at www.tpc.org.
High Availability Impressive performance is necessary but not at the cost of availability. Organizations need constant access to their data. Many enterprises are now finding a need to have 24 hour x 7 day access available. The SQL Server platform was designed with the highest levels of availability in mind. As each version of the product has been released, more and more capabilities have been added to minimize any potential downtime.
Security Utmost in the minds of enterprise managers is the need to secure organizational data. Security is not able to be retrofitted after an application or a product is created. SQL Server has been built from the ground up with the highest levels of security as a goal.
Introduction to SQL Server 2008 R2 and its Toolset
1-5
Scalability Organizations have a need for data management capabilities for systems of all sizes. SQL Server scales from the smallest needs to the largest via a series of editions with increasing capabilities.
Cost of Ownership Many competing database management systems are expensive to both purchase and to maintain. SQL Server offers very low total cost of ownership. SQL Server tooling (both management and development) builds on existing Windows knowledge. Most users tend to become familiar with the tools quite quickly. The productivity achieved when working with the tools is enhanced by the high degree of integration between the tools. For example, many of the SQL Server tools have links to launch and preconfigure other SQL Server tools.
1-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Components
Key Points SQL Server is a very good relational database engine but as a data platform, it offers much more than a relational database engine.
SQL Server Components SQL Server is a platform comprising many components, each designed to enhance data management capabilities. Component
Purpose
Database Engine
Is a relational database engine based on the SQL language
Analysis Services
Is an online analytical processing (OLAP) engine that works with analytic cubes
Integration Services
Is a tool used to orchestrate the movement of data between SQL Server components and external systems (in both directions)
Reporting Services
Offers a reporting engine based on web services and provides a web portal and end-user reporting tools
Master Data Services
Provides tooling and a hub for managing master or reference data
StreamInsight
Is a platform for building applications to process high-speed events
Data Mining
Provides tooling and an inference engine for deriving knowledge and insights from existing OLAP data or relational data
Full-Text Search
Allows building sophisticated search options into applications
PowerPivot
Allows end-users, power users, and business analysts to quickly analyze large volumes of data from different locations
Replication
Allows moving data between servers to suit data distribution needs
Introduction to SQL Server 2008 R2 and its Toolset
Question: Which components of SQL Server have you worked with?
1-7
1-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Instances
Key Points It is sometimes useful to install more than a single copy of a SQL Server component on a single server. Many SQL Server components can be installed more than once as separate instances.
SQL Server Instances The ability to install multiple instances of SQL Server components on a single server is useful in a number of situations: • •
•
• •
There may be a need to have different administrators or security environments for sets of databases. Each instance of SQL Server is separately manageable and securable. Applications that need to be supported by an organization may require server configurations that are inconsistent or incompatible with the server requirements of other applications. Each instance of SQL Server is separately configurable. Application databases might need to be supported with different levels of service, particularly in relation to availability. SQL Server instances can be used to separate workloads with differing service level agreements (SLAs) that need to be met. Different versions of SQL Server might need to be supported. Applications might require different server-level collations. While each database can have different collations, an application might be dependent on the collation of the tempdb database when the application is using temporary objects.
Question: Why might you need to separate databases by service level agreement? Different versions of SQL Server can often be installed side-by-side using multiple instances. This can assist when testing upgrade scenarios or performing upgrades.
Default and Named Instances Prior to SQL Server 2000, only a single copy of SQL Server could be installed on a server system. SQL Server was addressed by the name of the server. To maintain backward compatibility, this mode of connection is still supported and is known as a "default" instance.
Introduction to SQL Server 2008 R2 and its Toolset
1-9
Additional instances of SQL Server require an instance name in addition to the server name and are known as "named" instances. Not all components of SQL Server are able to be installed in more than one instance. In particular, SQL Server Integration Services is installed once per server. There is no need to install SQL Server tools and utilities more than once. A single installation of the tools is able to manage and configure all instances.
1-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Editions
Key Points SQL Server is available in a wide variety of editions, with different price points and different levels of capability.
SQL Server Editions Each SQL Server edition is targeted to a specific business use case as shown in the following table:
Edition
Business Use Case
Parallel Data Warehouse
Uses massively parallel processing (MPP) to execute queries against vast amount of data quickly. Parallel Data Warehouse systems are sold as a complete "appliance" rather than via standard software licenses
Datacenter
Provides the highest levels of scalability for mission-critical applications
Enterprise
Provides the highest levels of reliability for demanding workloads
Standard
Delivers a reliable, complete data management and Business Intelligence (BI) platform
Express
Is a free edition for lightweight web and small server-based applications
Compact
Is a free edition for standalone and occasionally connected mobile applications, optimized for a very small memory footprint
Developer
Allows building, testing, and demonstrating all SQL Server functionality
Introduction to SQL Server 2008 R2 and its Toolset
Workgroup
Runs branch applications with secure remote synchronization and management capabilities
Web
Provides a secure, cost effective, and scalable platform for public web sites and applications
SQL Azure
Allows building and extending SQL Server applications to a cloud-based platform
Question: What would be a good business case example for using a cloud-based service?
1-11
1-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Versions
Key Points SQL Server is a platform with a rich history of innovation achieved while maintaining strong levels of stability. SQL Server has been available for many years, yet it is rapidly evolving new capabilities and features.
Early Versions The earliest versions (1.0 and 1.1) were based on the OS/2 operating system. Versions 4.2 and later moved to the Windows operating system, initially on the Windows NT operating system.
Later Versions Version 7.0 saw a significant rewrite of the product. Substantial advances were made in reducing the administration workload for the product. OLAP Services (which later became Analysis Services) was introduced. SQL Server 2000 featured support for multiple instances and collations. It also introduced support for data mining. SQL Server Reporting Services was introduced after the product release as an add-on enhancement to the product, along with support for 64-bit processors. SQL Server 2005 provided another significant rewrite of many aspects of the product. It introduced support for: • • • • •
Non-relational data stored and queried as XML. SQL Server Management Studio was released to replace several previous administrative tools. SQL Server Integration Services replaced a former tool known as Data Transformation Services (DTS). Another key addition to the product was the introduction of support for objects created using the Common Language Runtime (CLR). The T-SQL language was substantially enhanced, including structured exception handling.
Introduction to SQL Server 2008 R2 and its Toolset
• • •
1-13
Dynamic Management Views and Functions were introduced to enable detailed health monitoring, performance tuning, and troubleshooting. Substantial high availability improvements were included in the product. Database mirroring was introduced. Support for column encryption was introduced.
SQL Server 2008 also provided many enhancements: • • • • • • • •
The "SQL Server Always On" technologies were introduced to reduce potential downtime. Filestream support improved the handling of structured and semi-structured data. Spatial data types were introduced. Database compression and encryption technologies were added. Specialized date- and time-related data types were introduced, including support for timezones within datetime data. Full-text indexing was integrated directly within the database engine. (Previously full-text indexing was based on interfaces to operating system level services). A policy-based management framework was introduced to assist with a move to more declarativebased management practices, rather than reactive practices. A PowerShell provider for SQL Server was introduced.
SQL Server 2008 R2 The enhancements and additions to the product in SQL Server 2008 R2 included: • • • • • •
Substantial enhancements to SQL Server Reporting Services. The introduction of advanced analytic capabilities with PowerPivot. Improved multi-server management capabilities were added. Support for managing reference data was provided with the introduction of Master Data Services. StreamInsight provides the ability to query data that is arriving at high speed, before storing the data in a database. Data-Tier applications assist with packaging database applications as part of application development projects.
Upcoming Versions The next version of SQL Server has been announced. This version will enable the efficient delivery of mission-critical solutions through a highly scalable and available platform. Additional productivity tools and features will assist developers and the reach of business intelligence tooling to end-users will be enhanced. Question: Which versions of SQL Server have you worked with?
1-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 2
Working with SQL Server Tools
Working effectively with SQL Server requires familiarity with the tools that are used in conjunction with SQL Server. Before any tool can connect to SQL Server, it needs to make a network connection to the server. In this lesson, you will see how these connections are made, then look at the tools that are most commonly used when working with SQL Server.
Objectives After completing this lesson, you will be able to: • • • • •
Connect from Clients and Applications Describe the roles of Software Layers for Connections Use SQL Server Management Studio Use Business Intelligence Development Studio Use Books Online
Introduction to SQL Server 2008 R2 and its Toolset
1-15
Connecting from Clients and Applications
Key Points Client applications connect to endpoints. A variety of communication protocols are available for making connections. Also, users need to be identified before they are permitted to use the server.
Connectivity The protocol that client applications use when connecting to the SQL Server relational database engine is known as Tabular Data Stream (TDS). It defines how requests are issued and how results are returned. Other components of SQL Server use alternate protocols. For example, clients to SQL Server Analysis Services communicate via the XML for Analysis (XML/A) protocol. However, in this course, you are primarily concerned with the relational database engine. TDS is a high-level protocol that is transported by lower-level protocols. It is most commonly transported by the TCP/IP protocol, the Named Pipes protocol, or implemented over a shared memory connection. SQL Server 2008 R2 does support connection over the Virtual Interface Adapter (VIA) protocol but the use of this protocol with SQL Server is now deprecated and should not be used for new implementations.
Authentication For the majority of applications and organizations, data must be held securely and access to the data is based on the identity of the user attempting to access the data. The process of verifying the identity of a user (or more formally, of any principal), is known as authentication. SQL Server supports two forms of authentication. It can store the login details for users directly within its own system databases. These logins are known as SQL Logins. Alternately, it can be configured to trust a Windows authenticator (such as Active Directory). In that case, a Windows user can be granted access to the server, either directly or via his/her Windows group memberships. When a connection is made, the user is connected to a specific database, known as their "default" database.
1-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Software Layers for Connections
Key Points Connections to SQL Server are made through a series of software layers. It is important to understand how each of these layers interacts. This knowledge will assist you when you need to perform configuration or troubleshooting.
Client Libraries Client applications use programming libraries to simplify their access to databases such as SQL Server. Open Database Connectivity (ODBC) is a commonly used library. It operates as a translation layer that shields the application from some details of the underlying database engine. By changing the ODBC configuration, an application could be altered to work with a different database engine, without the need for application changes. OLEDB originally stood for Object Linking and Embedding for Databases, however, that meaning is now not very relevant. OLEDB is a library that does not translate commands. When an application sends a SQL command, OLEDB passes it to the database server without modification. The SQL Server Native Access Component (SNAC) is a software layer that encapsulates commands issued by libraries such as OLEDB and ODBC into commands that can be understood by SQL Server and encapsulates results returned by SQL Server ready for consumption by these libraries. This primarily involves wrapping the commands and results in the TDS protocol.
Network Libraries SQL Server exposes endpoints that client applications can connect to. The endpoint is used to pass commands and data to/from the database engine. SNAC connects to these endpoints via network libraries such as TCP/IP, Named Pipes, or VIA. Please note that the use of the VIA protocol with SQL Server is now deprecated. For client applications that are
Introduction to SQL Server 2008 R2 and its Toolset
1-17
executing on the same computer as the SQL Server service, a special "shared memory" network connection is also available.
SQL Server Software Layers SQL Server receives commands via endpoints and sends results to clients via endpoints. Clients interact with the Relational engine which in turn utilizes the Storage engine to manage the storage of databases. SQL Server Operating System (SQL OS) is a software layer that provides a layer of abstraction between the Relational engine and the available server resources.
1-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Management Studio
Key Points SQL Server Management Studio (SSMS) is the primary tool supplied by Microsoft for interacting with SQL Server services.
SQL Server Management Studio SSMS is an integrated environment that has been created within the Microsoft Visual Studio platform shell. SSMS shares many common features with Visual Studio. SSMS is used to execute queries and return results but it is also capable of helping users to analyze queries. It offers rich editors for a variety of document types (.sql files, .xml files, etc.). When working with .sql files, SSMS provides IntelliSense to assist with writing queries. While all SQL Server relational database management tasks can be performed using the T-SQL language, many users prefer graphical administration tools as they are typically easier to use than the T-SQL commands. SSMS provides graphical interfaces for configuring databases and servers. SSMS is capable of connecting to a variety of SQL Server services including the database engine, Analysis Services, Integration Services, Reporting Services, and SQL Server Compact Edition.
Introduction to SQL Server 2008 R2 and its Toolset
1-19
Demonstration 2A: SQL Server Management Studio
Key Points In this demonstration you will see how to work with SQL Server Management Studio.
Demonstration Setup 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
In Hyper-V Manager, in the Virtual Machines pane, right-click 623XB-MIA-SQL1 and click Revert.
3.
If you are prompted to confirm that you want to revert, click Revert.
4.
If you do not already have a Virtual Machine Connection window, right-click 623XB-MIA-SQL1 and click Connect.
Demonstration Steps 1.
In the Virtual Machine, Click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio.
2.
In the Connect to Server window, ensure that Server Type is set to Database Engine.
3.
In the Server name text box, type (local).
4.
In the Authentication drop-down list, select Windows Authentication, and click Connect.
5.
From the View menu, click Object Explorer.
6.
In Object Explorer, expand Databases, expand AdventureWorks2008R2, and Tables. Review the database objects.
7.
Right-click the AdventureWorks2008R2 database and choose New Query.
8.
Type the query shown in the snippet below.
SELECT * FROM Production.Product ORDER BY ProductID;
1-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
9.
Note the use of Intellisense while entering it, and then click Execute on the toolbar. Note how the results can be returned.
10. From the File menu click Save SQLQuery1.sql. Note this saves the query to a file. 11. In the Results tab, right-click on the cell for ProductID 1 (first row and first cell) and click Save Results As…. In the FileName textbox, type Demonstration2AResults and click Save. Note this saves the query results to a file. 12. From the Query menu, click Display Estimated Execution Plan. Note that SSMS is capable of more than simply executing queries. 13. From the Tools menu, and click Options. 14. In the Options pane, expand Query Results, expand SQL Server, and expand General. Review the available configuration options and click Cancel. 15. From the File menu, click Close. In the Microsoft SQL Server Management Studio window, click No. 16. In the File menu, click Open, and click Project/Solution. 17. In the Open Project window, open the project D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln. 18. From the View menu, click Solution Explorer. Note the contents of Solution Explorer. SQL Server projects have been supplied for each module of the course and contain demonstration steps and suggested lab solutions, along with any required setup/shutdown code for the module. 19. In the Solution Explorer, click the X to close it. 20. In Object Explorer, from the Connect toolbar icon, note the other SQL Server components that connections can be made to: •
Database Engine
•
Analysis Services
•
Integration Services
•
Reporting Services
•
SQL Server Compact
21. From the File menu, click New, and click Database Engine Query to open a new connection. 22. In the Connect to Database Engine window, type (local) in the Server name text box. 23. In the Authentication drop-down list, select Windows Authentication, and click Connect. 24. In the Available Databases drop-down list, click tempdb database. Note this will change the database that the query is executed against. 25. Right-click in the query window and click Connection, and click Change Connection… Note: this will reconnect the query to another instance of SQL Server. 26. From the View menu, click Registered Servers. 27. In the Registered Servers window, expand Database Engine, right-click Local Server Groups, and click New Server Group… 28. In the New Server Group Properties window type Dev Servers in the Group name textbox and click OK. 29. Right-click Dev Servers and click New Server Registration…
Introduction to SQL Server 2008 R2 and its Toolset
1-21
30. In the New Server Registration window, click Server name drop-down list, type (local) and click Save. 31. Right-click Dev Servers and click New Server Registration… 32. In the New Server Registration window, click Server name drop-down list, type .\MKTG and click Save. 33. In the Registered Servers window, right-click the Dev Servers group and choose New Query. 34. Type the query as shown in the snippet below and click Execute toolbar icon. SELECT @@version;
Question: When would displaying an estimated execution plan be helpful?
1-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Business Intelligence Development Studio
Key Points The SQL Server platform comprises a number of components. Projects for several of the Business Intelligence related components are created and modified using Business Intelligence Development Studio (BIDS).
Business Intelligence Development Studio BIDS is a series of project templates that have been added to the Microsoft Visual Studio 2008 environment. The templates allow the creation and editing of projects for Analysis Services, Integration Services, and Reporting Services. Visual Studio does not need to be installed before SQL Server. If an existing installation of Visual Studio 2008 is present, SQL Server installation will add project templates into that environment. If no existing Visual Studio 2008 installation is present, SQL Server installation will first install the "partner" edition of Visual Studio 2008 and then add the required project templates. The Partner edition of Visual Studio 2008 is essentially an empty Visual Studio shell with a template for a "blank solution". BIDS in SQL Server 2008 R2 is based on Visual Studio 2008. If Visual Studio 2010 is already installed, SQL Server will install the partner edition of Visual Studio 2008 side-by-side with the existing Visual Studio 2010 installation.
Introduction to SQL Server 2008 R2 and its Toolset
1-23
Demonstration 2B: Business Intelligence Development Studio
Key Points In this demonstration you will see how to work with SQL Server Business Intelligence Development Studio.
Demonstration Steps 1.
If Demonstration 2A was not performed: Revert the 623XB-MIA-SQL1 virtual machine using Hyper-V Manager on the host system, and connect to the Virtual Machine.
2.
In the Virtual Machine, Click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Business Intelligence Development Studio (BIDS). From the File menu, expand New, and click Project. Note the available project templates(If other languages are installed, note how they are still present as well).
3.
In the Templates pane, click Report Server Project, and click OK.
4.
In Solution Explorer, right-click Reports and click Add New Report.
5.
In the Report Wizard window, click Next.
6.
In the Select the Data Source window, click Edit.
7.
In the Connection Properties window, type (local) for the Server name and in the Connect to a database drop-down list, select AdventureWorks2008R2, and click OK.
8.
In the Select the Data Source window, click Next.
9.
In the Design the Query window, for the Query string textbox, type the following query as shown in snippet below and click Next.
SELECT ProductID, Name, Color, Size FROM Production.Product ORDER BY ProductID;
10. In the Select the Report Type window, click Next.
1-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
11. In the Design the Table window, click Details four times, and click Finish>>|. 12. In the Completing the Wizard window, click Finish. 13. In the Report1.rdl [Design] tab, click Preview and note the report that is rendered. 14. Click on the Design tab, from the File menu click Exit. Note do not save the changes. Question: Can you suggest a situation where the ability to schedule the execution of a report would be useful?
Introduction to SQL Server 2008 R2 and its Toolset
1-25
Books Online
Key Points Books Online (BOL) is the primary reference for SQL Server. It can be installed offline (for use when disconnected from the Internet) and can also be used online directly from the Microsoft MSDN web site (via an Internet connection).
Books Online BOL should be regarded as the primary technical reference for SQL Server. A common mistake when installing BOL locally on a SQL Server installation is to neglect to update BOL regularly. To avoid excess download file sizes, BOL is not included in SQL Server service pack and cumulative update packages. BOL is regularly updated and a regular check should be made for updates. For most T-SQL commands, many users will find the examples supplied easier to follow than the formal syntax definition. Note that when viewing the reference page for a statement, the formal syntax is shown at the top of the page and the examples are usually at the bottom of the page. BOL is available for all supported versions of SQL Server. It is important to make sure you are working with the pages designed for the version of SQL Server that you are working with. Many pages in BOL provide links to related pages from other versions of the product.
1-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 2C: Books Online
Key Points In this demonstration you will see how to work with SQL Server Business Books Online.
Demonstration Steps 1.
If Demonstration 2A was not performed: Revert the 623XB-MIA-SQL1 virtual machine using Hyper-V Manager on the host system, and connect to the Virtual Machine.
2. 3.
In the Virtual Machine, Click Start, click All Programs, click Microsoft SQL Server 2008 R2, click Documentation and Tutorials, and click SQL Server Books Online. In the Contents window, click SQL Server 2008 R2 Books Online. Note the basic navigation options available within BOL.
4.
In the Virtual Machine, Click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio.
5.
In the Connect to Server window, ensure that Server Type is set to Database Engine.
6.
In the Server name text box, type (local).
7.
In the Authentication drop-down list, select Windows Authentication, and click Connect.
8.
From the File menu, click New, and click Query with Current Connection.
9.
In the SQLQuery1.sql tab, type the query as shown in the snippet below and click Execute toolbar icon.
SELECT SUBSTRING('test string',2,7);
10. Click the name of the function SUBSTRING, then hit the F1 key to open the BOL topic for SUBSTRING.
Introduction to SQL Server 2008 R2 and its Toolset
1-27
11. In the Online Help Settings window, ensure Use local Help as primary source option button, and click OK. Note the content of the page and scroll to the bottom to see the examples. 12. From the File menu, click Exit. 13. In the host system, open Internet Explorer and browse to the SQL Server Books Online page: http://msdn.microsoft.com/en-us/library/ms130214.aspx and note the available online options.
1-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Configuring SQL Server Services
Each SQL Server service can be configured individually. The ability to provide individual configuration for services assists organizations that aim to minimize the permissions assigned to service accounts, as part of a policy of least privilege execution. SQL Server Configuration Manager is used to configure services, including the accounts that the services operate under, and the network libraries used by the SQL Server services. SQL Server also ships with a variety of tools and utilities. It is important to know what each of these tools and utilities is used for.
Objectives After completing this lesson, you will be able to: • • • • •
Use SQL Server Configuration Manager Use SQL Server Services Use Network Ports and Listeners Create Server Aliases Use other SQL Server tools
Introduction to SQL Server 2008 R2 and its Toolset
1-29
SQL Server Configuration Manager
Key Points SQL Server Configuration Manager (SSCM) is used to configure SQL Server services, to configure the network libraries exposed by SQL Server services, and to configure how client connections are made to SQL Server.
SQL Server Configuration Manager SSCM can be used for three distinct purposes: • • •
Managing Services – Each service can be controlled (started or stopped) and configured. Managing Server Protocols – It is possible to configure the endpoints that are exposed by the SQL Server services. This includes the protocols and ports used. Managing Client Protocols – When client applications (such as SSMS) are installed on a server, there is a need to configure how connections from those tools are made to SQL Server. SSCM can be used to configure the protocols required and can be used to create aliases for the servers to simplify connectivity.
Question: Why would a server system need to have a client configuration node?
1-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
SQL Server Services
Key Points SQL Server Configuration Manager can be used to configure the individual services that are provided by SQL Server. Many components provided by SQL Server are implemented as operating system services. The components of SQL Server that you choose during installation determine which of the SQL Server services are installed.
SQL Server Services These services operate within a specific Windows identity. If there is a need to alter the assigned identity for a service, SSCM should be used to make this change. A common error is to use the Services applet in the server's administrative tools to change the service identity. While this applet will change the identity for the service, it will not update the other permissions and access control lists that are required for the service to operate correctly. When service identities are modified from within SSCM, the required permissions and access control lists are also modified. Each service has a start mode. This mode can be set to Automatic, Manual, or Disabled. Services that are set to the Automatic start mode are automatically started when the operating system starts. Services that are set to the Manual start mode can be manually started. Services that are set to the Disabled start mode cannot be started.
Instances Many SQL Server components are instance-aware and can be installed more than once on a single server. When SSCM lists each service, it shows the associated instance of SQL Server in parentheses after the name of the service. In the example shown in the slide, there are two instances of the database engine installed. PARTNER is the name of a named instance of the database engine. MSSQLSERVER is the default name allocated to the default instance of the SQL Server database engine.
Introduction to SQL Server 2008 R2 and its Toolset
1-31
Network Ports and Listeners
Key Points SQL Server Configuration Manager can be used to configure both server and client protocols and ports.
Network Ports and Listeners SSCM provides two sets of network configurations. Each network endpoint that is exposed by an instance of SQL Server can be configured. This includes the determination of which network libraries are enabled and, for each library, the configuration of the network library. Typically, this will involve settings such as protocol port numbers. You should discuss the required network protocol configuration of SQL Server with your network administrator. Many protocols provide multiple levels of configuration. For example, the configuration for the TCP/IP protocol allows for different settings on each configured IP address if required, or a general set of configurations that are applied to all IP addresses.
Client Configurations Every computer that has SNAC installed needs the ability to configure how that library will access SQL Server services. SNAC is installed on the server as well as on client systems. When SSMS is installed on the server, it uses the SNAC library to make connections to the SQL Server services that are on the same system. The client configuration nodes within SSCM can be used to configure how those connections are made. Note that two sets of client configurations are provided. One set is used for 32-bit applications; the other set is used for 64-bit applications. SSMS is a 32-bit application, even when SQL Server is installed as a 64-bit application.
1-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating Server Aliases
Key Points Connecting to a SQL Server service can involve multiple settings such as server address, protocol, and port. To make this easier for client applications and to provide a level of available redirection, aliases can be created for servers.
Aliases Hard-coding connection details for a specific server, protocol, and port within an application is not desirable as these might need to change over time. A server alias can be created and associated with a server, protocol, and port (if required). Client applications can then connect to the alias without being concerned about how those connections are made. Each client system that utilizes SNAC (including the server itself) can have one or more aliases configured. Aliases for 32-bit applications are configured independently of the aliases for 64-bit applications. In the example shown in the slide, the alias "Marketing" has been created for the local server "." and utilizing the named pipes protocol "np" and named pipe address that is based on the name of the computer or the value "." for the local computer. The client then only needs to connect to the name "Marketing".
Introduction to SQL Server 2008 R2 and its Toolset
1-33
Other SQL Server Tools
Key Points SQL Server provides a rich set of tools and utilities to make working with the product easier. The most commonly used tools are listed in the following table: Tool
Purpose
SQL Server Profiler
Trace activity from client applications to SQL Server. Supports both the database engine and Analysis Services.
Database Engine Tuning Advisor
Design indexes and statistics to improve database performance, based on analysis of trace workloads.
Master Data Services Configuration Manager
Configure and manage SQL Server Master Data Services
Reporting Services Configuration Manager
Configure and manage SQL Server Reporting Services
SQL Server Error and Usage Configure the level of automated reporting back to the SQL Server product Reporting team about errors that occur and on usage of different aspects of the product. PowerShell Provider
Allow configuring and querying SQL Server using PowerShell.
SQL Server Management Objects (SMO)
Provide a detailed .NET based library for working with management aspects of SQL Server directly from application code.
1-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: SQL Server Profiler
Key Points In this demonstration you will see how SQL Server Profiler can capture traces of statements executed.
Demonstration Steps 1.
If Demonstration 2A was not performed: Revert the 623XB-MIA-SQL1 virtual machine using Hyper-V Manager on the host system, and connect to the Virtual Machine.
2.
In the Virtual Machine, Click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio.
3.
In the Connect to Server window, ensure that Server Type is set to Database Engine.
4.
In the Server name text box, type (local).
5.
In the Authentication drop-down list, select Windows Authentication, and click Connect.
6.
From the Tools menu, click SQL Server Profiler.
7.
In the Connect to Server window, ensure that Server Type is set to Database Engine.
8.
In the Server name text box, type (local).
9.
In the Authentication drop-down list, select Windows Authentication, and click Connect.
10. In the Trace Properties window, click Run. Note this will start a new trace with the default options. 11. Switch to SSMS, click New Query toolbar icon. 12. In the Query window, type the query as shown in the snippet below, and click Execute toolbar icon. USE AdventureWorks2008R2 GO SELECT * FROM Person.Person ORDER BY FirstName;
Introduction to SQL Server 2008 R2 and its Toolset
GO
13. Switch to SQL Server Profiler. Note the statement trace occurring in SQL Server Profiler. 14. From the File menu and click Stop Trace. 15. In the Results grid, click individual statements to see the detail shown in the lower pane. Question: What could you use captured trace files for?
1-35
1-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 1: Introduction to SQL Server and its Toolset
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3. 4. 5.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V window. In the Virtual Machines list, right-click 623XB-MIA-DC and click Start. Right-click 623XB-MIA-DC and click Connect. In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, then close the Virtual Machine Connection window. 6. In Hyper-V Manager, in the Virtual Machines list, right-click 623XB-MIA-SQL1 and click Start. 7. Right-click 623XB-MIA-SQL1 and click Connect. 8. In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears. 9. On the Action menu, click the Ctrl+Alt+Delete menu item. 10. Click Switch User, then click Other User. 11. Log on using the following credentials: • •
User name: AdventureWorks\Administrator Password: Pa$$w0rd
12. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 13. If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
Lab Scenario AdventureWorks is a global manufacturer, wholesaler and retailer of cycle products. The owners of the company have decided to start a new direct marketing arm of the company. It has been created as a new company named Proseware, Inc. Even though it has been set up as a separate company, it will receive
Introduction to SQL Server 2008 R2 and its Toolset
1-37
some IT-related services from the existing AdventureWorks company and will be provided with a subset of the corporate AdventureWorks data. The existing AdventureWorks company SQL Server platform has been moved to a new server that is capable of supporting both the existing workload and the workload from the new company. In this lab, you are ensuring that the additional instance of SQL Server has been configured appropriately and making a number of additional required configuration changes.
1-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Verify SQL Server Component Installation A new instance of SQL Server has been installed by the IT department at AdventureWorks. It will be used by the new direct marketing company. The SQL Server named instance is called MKTG. In the first exercise, you need to verify that the required SQL Server components have been installed. The main tasks for this exercise are as follows: 1. 2. 3.
Check that Database Engine and Reporting Services have been installed for the MKTG instance. Note the services that are installed for the default instance and that Integration Services is not installed on a per instance basis. Ensure that all required services including SQL Server Agent are started and set to autostart for both instances.
Task 1: Check that Database Engine and Reporting Services have been installed for the MKTG instance •
Open SQL Server Configuration Manager.
•
Check the installed list of services for the MKTG instance and ensure that the database engine and Reporting Services have been installed for the MKTG instance.
Task 2: Note the services that are installed for the default instance and that Integration Services is not installed on a per instance basis •
Note the list of services that are installed for the default instance.
•
Note that Integration Services has no instance name shown as it is not installed on a per-instance basis.
Task 3: Ensure that all required services including SQL Server Agent are started and set to autostart for both instances •
Ensure that all the MKTG services are started and set to autostart. (Ignore the Full Text Filter Daemon at this time).
•
Ensure that all the services for the default instance are set to autostart. (Ignore the Full Text Filter Daemon at this time). Results: After this exercise, you have checked that the required SQL Server services are installed, started, and configured to autostart.
Introduction to SQL Server 2008 R2 and its Toolset
Exercise 2: Alter Service Accounts for New Instance Scenario The SQL Server services for the MKTG instance have been configured to execute under the AdventureWorks\SQLService service account. In this exercise, you will configure the services to execute under the AdventureWorks\PWService service account. The main tasks for this exercise are as follows: 1. 2. 3.
Change the service account for the MKTG database engine. Change the service account for the MKTG SQL Server Agent. Change the service account for the MKTG Reporting Services service.
Task 1: Change the service account for the MKTG database engine •
Change the service account for the MKTG database engine service to AdventureWorks\PWService using the properties page for the service.
Task 2: Change the service account for the MKTG SQL Server Agent •
Change the service account for the MKTG SQL Server Agent service to AdventureWorks\PWService using the properties page for the service and then restart the service. Results: After this exercise, you have configured the service accounts for the MKTG instance.
1-39
1-40
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 3: Enable Named Pipes Protocol for Both Instances Scenario Client applications that are installed on the server will connect to the database engine using the named pipes protocol. In this exercise, you will enable the named pipes protocol for both database engine instances. The main tasks for this exercise are as follows: 1. 2. 3.
Enable the named pipes protocol for the default instance. Enable the named pipes protocol for the MKTG instance. Restart database engine services for both instances.
Task 1: Enable the named pipes protocol for the default instance •
Enable the named pipes protocol for the default database engine instance using the Protocols window.
Task 2: Enable the named pipes protocol for the MKTG instance •
Enable the named pipes protocol for the MKTG database engine instance using the Protocols window.
Task 3: Restart both database engine services •
Restart the default database engine instance
•
Restart the MKTG database engine instance
•
Check to ensure that both instances have been restarted successfully Results: After this exercise, you should have enabled the named pipes protocol for both database engine instances.
Introduction to SQL Server 2008 R2 and its Toolset
1-41
Exercise 4: Create Aliases for AdventureWorks and Proseware Scenario Client applications that are installed on the server will use aliases to connect to the database engine services. In this exercise, you will configure aliases for both the default instance (AdventureWorks) and for the MKTG instance (Proseware). Both 32-bit and 64-bit aliases should be configured. You will use SQL Server Management Studio to test the aliases once they have been configured. The main tasks for this exercise are as follows: 1. 2. 3. 4. 5.
Create a 32-bit alias (AdventureWorks) for the default instance. Create a 32-bit alias (Proseware) for the MKTG instance. Create a 64-bit alias (AdventureWorks) for the default instance. Create a 64-bit alias (Proseware) for the MKTG instance. Use SQL Server Management Studio to connect to both aliases to ensure they work as expected.
Task 1: Create a 32-bit alias (AdventureWorks) for the default instance •
Create a 32-bit alias for the default instance. Call the alias AdventureWorks and connect via named pipes. Use the servername ".".
Task 2: Create a 32-bit alias (Proseware) for the MKTG instance •
Create a 32-bit alias for the MKTG instance. Call the alias Proseware and connect via named pipes. Use the servername ".\MKTG".
Task 3: Create a 64-bit alias (AdventureWorks) for the default instance •
Create a 64-bit alias for the default instance. Call the alias AdventureWorks and connect via named pipes. Use the servername ".".
Task 4: Create a 64-bit alias (Proseware) for the MKTG instance •
Create a 64-bit alias for the MKTG instance. Call the alias Proseware and connect via named pipes. Use the servername ".\MKTG".
Task 5: Use SQL Server Management Studio to connect to both aliases to ensure they work as expected •
Open SQL Server Management Studio.
•
Connect to the Proseware alias.
•
In Object Explorer, connect also to the AdventureWorks alias. Results: After this exercise, you should have created and tested aliases for both database engine instances.
1-42
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 5: Ensure SQL Browser is Disabled and Configure a Fixed TCP/IP Port (Only if time permits) Scenario Client applications will need to connect to the MKTG database engine instance via the TCP/IP protocol. As their connections will need to traverse a firewall, the port used for connections cannot be configured as a dynamic port. The port number must not change. Corporate policy at AdventureWorks is that named instances should be accessed via fixed TCP ports and the SQLBrowser service should be disabled. In this exercise, you will make configuration changes to comply with these requirements. A firewall exception has already been created for port 51550, for use with the MKTG database engine instance. The main tasks for this exercise are as follows: 1.
Configure the TCP port for the MKTG database engine instance to 51550. •
Disable the SQLBrowser service.
Task 1: Configure the TCP port for the MKTG database engine instance to 51550 •
Using the property page for the TCP/IP server protocol, configure the use of the fixed port 51550. (Make sure that you clear the dynamic port)
•
Restart the MKTG database engine instance.
•
Ensure that the MKTG database engine instance has been restarted successfully.
Task 2: Disable the SQLBrowser service •
Stop the SQLBrowser service.
•
Set the Start Mode for the SQL Browser service to Disabled. Results: After this exercise, you will have configured a fixed TCP port for the MKTG database engine instance and disabled the SQLBrowser service.
Introduction to SQL Server 2008 R2 and its Toolset
Module Review and Takeaways
Review Questions 1. 2. 3.
What is the difference between a SQL Server version and an edition? What is the purpose of the Business Intelligence Development Studio? Does Visual Studio need to be installed before BIDS?
Best Practices 1. 2. 3.
Ensure that developer edition licenses are not used in production environments. Develop using the least privileges possible, to avoid accidentally building applications that will not run for standard users. If using an offline version of Books Online, ensure it is kept up to date.
1-43
1-44
Implementing a Microsoft® SQL Server® 2008 R2 Database
Working with Data Types
Module 2 Working with Data Types Contents: Lesson 1: Using Data Types
2-3
Lesson 2: Working with Character Data
2-19
Lesson 3: Converting Data Types
2-27
Lesson 4: Specialized Data Types
2-34
Lab 2: Working with Data Types
2-40
2-1
2-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
One of the most important decisions that will be taken when designing a database is the data types to be associated with the columns of every table in the database. The data type of a column determines the type and range of values that can be stored in the column. Other objects in SQL Server such as variables and parameters also use these same data types. A very common design error is to use inappropriate data types. As an example, while you can store a date in a string column, doing so is rarely a good idea. In this module, you will see the range of data types that are available within SQL Server and receive advice on where each should be used.
Objectives After completing this lesson, you will be able to: • • • •
Work with data types Work with character data Convert between data types Use specialized data types
Working with Data Types
2-3
Lesson 1
Using Data Types
The most basic types of data that get stored in database systems are numbers, dates, and strings. There are a range of data types that can be used for each of these. In this lesson, you will see the available range of data types that can be used for numeric and date-related data. You will also see how to determine if a data type should be nullable or not. In the next lesson, you will see how to work with string data types.
Objectives After completing this lesson, you will be able to: • • • • • •
•
Understand the role of data types Use exact numeric data types Use approximate numeric data types Work with IDENTITY columns Use date and time data types Work with unique identifiers Decide on appropriate nullability of data
2-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
Introducing Data Types
Key Points Data types determine what can be stored in locations within SQL Server such as columns, variables, and parameters. For example, a tinyint column can only store values from 0 to 255. They also determine the types of values that can be returned from expressions.
Constraining Values Data types are a form of constraint that is placed on the values that can be stored in a location. For example, if you choose a numeric data type, you will not be able to store text in the location. As well as constraining the types of values that can be stored, data types also constrain the range of values that can be stored. For example, if you choose a smallint data type, you can only store values between -32768 and 32767.
Query Optimization When SQL Server knows that the value in a column is an integer, it may be able to come up with an entirely different query plan to one where it knows the location is holding text values. The data type also determines which sorts of operations are permitted on that data and how those operations work.
Self-Documenting Nature Choosing an appropriate data type provides a level of self-documentation. If all values were stored in the sql_variant (which is a data type that can store any type of value) or xml data types, it's likely that you would need to store documentation about what sort of values can be stored in the sql_variant locations.
Working with Data Types
2-5
Data Types There are three basic sets of data types: System Data Type - SQL Server provides a large number of built-in (or intrinsic) data types. Examples of these would be integer, varchar, and date. Alias Data Type - Users can also define data types that provide alternate names for the system data types and potentially further constrain them. These are known as alias data types. An example of an alias data type would be to define the name PhoneNumber as being equivalent to nvarchar(16). Alias data types can help provide consistency of data type usage across applications and databases. User-defined Data Type - With managed code via SQL Server CLR integration, entirely new data types can be created. There are two categories of these CLR types: system CLR data types (such as the geometry and geography spatial data types) and user-defined CLR data types that allow users to create their own data types. Managed code is discussed later in Module 16. Question: Why would it be faster to compare two integer variables that are holding the values 3240 and 19704 than two varchar(10) variables that are holding the values "3240" and "19704"?
2-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exact Numeric Data Types
Key Points Numeric data types can be exact or approximate. Exact data types are the most common data type used in business applications.
Integer Data Types SQL Server offers a range of integer data types that are used for storing whole numbers, based upon the size of the storage location for each: • • •
•
tinyint is stored in a single byte (that is, 8 bits) and can be used to store the values 0 to 255. Note that tinyint cannot store any negative values unlike the other integer data types. smallint is stored in two bytes (that is, 16 bits) and stores values from -32768 to 32767. int is stored in four bytes (that is, 32 bits) and stores values from -2147483648 to 2147483647. It is a very commonly used data type. SQL Server uses the full word "integer" as a synonym for "int". bigint is stored in eight bytes (that is, 64 bits) and stores very large integer values. While it is easy to refer to a 64-bit value, it is hard to comprehend how large these values are. If you placed a value of zero in a 64-bit integer location and executed a loop to simply add one to the value, on most common servers currently available, you would not reach the maximum value for many months.
Exact Fractional Data Types SQL Server provides a range of data types for storing exact numeric values that include decimal places: •
•
•
decimal is an ANSI compatible data type that allows you to specify the number of digits of precision and the number of decimal places (referred to as the scale). A decimal(12,5) location can store up to 12 digits with up to 5 digits after the decimal point. decimal is the data type that should be used for monetary or currency values in most systems and any exact fractional values such as sales quantities (where part quantities can be sold) or weights. numeric is a data type that is functionally equivalent to decimal. money and smallmoney are SQL Server specific data types that have been present since the early days of the platform. They were used to store currency values with a fixed precision of four decimal places.
Working with Data Types
2-7
This is often the wrong number of decimal places for many monetary applications and the data type is not a standard data type. In general, use decimal for monetary values.
bit Data Type bit is a data type that is stored in a single bit. The storage of the bit data type is optimized. If there are 8 or less bit columns in a table, they are stored in a single byte. bit values are commonly used to store the equivalent of Boolean values in higher-level languages. Note that there is not literal string format for bits in SQL Server. The string values TRUE and FALSE can be converted to bit values, as can the integer values 1 and 0. TRUE is converted to 1 and FALSE is converted to 0. Higher level programming languages differ about how they store true values in Boolean columns. Some languages store true values as 1; others store true values as -1. In 2's complement notation (which is the encoding used to store smallint, int, and bigint), a one bit value would range from -1 to 0. To avoid any chance of mismatch, in general when working with bits in applications, test for false values via: IF (@InputValue = 0)
but test for positive values via: IF (@InputValue <> 0)
rather than testing for a value being equal to 1, as this will provide more reliable code. Another aspect that surprises new users is that bit, along with other data types, is also nullable. That means that a bit location can be in three states: NULL, 0, or 1. Question: What would be a suitable data type for storing the value of a check box that can be 0 for unchecked, 1 for checked, or -1 for disabled?
2-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
Working with IDENTITY
Key Points It is common to require a series of numbers to be automatically provided for an integer column. The IDENTITY property on a database column indicates that the value for the column will not be provided by an INSERT statement but should be automatically provided by SQL Server.
IDENTITY IDENTITY is a property typically associated with integer or bigint columns that provide automated generation of values during insert operations. You may be familiar with auto-numbering systems or sequences in other database engines. While not identical to these, IDENTITY columns can be used to replace the functionality from those other database engines. When specifying the IDENTITY property, you specify a seed and an increment. The seed is the starting value. The increment is how much the value goes up by each time. Both seed and increment default to a value of 1 if they are not specified. Although explicit inserts are not normally allowed to columns with an IDENTITY property, it is possible to explicitly insert values. The ability to insert into an IDENTITY column can be enabled temporarily using a connection option. SET IDENTITY_INSERT ON can be used to allow the user to insert values into the column with the IDENTITY property instead of having it auto-generated. Having the IDENTITY property on a column does not in itself ensure that the column is unique. Unless there is also a unique constraint on the column, there is no guarantee that values in a column with the IDENTITY property will be unique.
Retrieving the Inserted Identity Value After inserting a row into a table, it is common to need to know the value that was placed into the column with the IDENTITY property. The system variable @@IDENTITY returns the last identity value used within the session, in any scope. This can be a problem with triggers that perform inserts on another table with an IDENTITY column as part of an INSERT statement.
Working with Data Types
2-9
For example, if you insert a row into a customer table, the customer might be assigned a new identity value. However, if a trigger on the customer table caused an entry to be written into an audit logging table when inserts are performed, the @@IDENTITY variable would return the identity value from the logging table, rather than the one from the customer table. To deal effectively with this, the SCOPE_IDENTITY() function was introduced. It provides the last identity value within the current scope only. In the previous example, it would return the identity value from the customer table. Another complexity relates to multi-row inserts. These were introduced in SQL Server 2008. In this situation, you may want to retrieve the IDENTITY column value for more than one row at a time. Typically, this would be implemented by the use of the OUTPUT clause on the INSERT statement.
2-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Approximate Numeric Data Types
Key Points SQL Server provides two approximate numeric data types. They are used more commonly in scientific applications than in business applications. A very common design error made by new developers is to use the float or real data types for storing business values such as monetary values.
Approximate Numeric Values The real data type is a 4 byte (that is 32 bit) numeric value that is encoded using ISO standard floating point encoding. The float data type is a SQL Server specific data type that allows for the storage of approximate values with a defined scale. The scale values permitted are from 1 to 53 and the default scale is 53. Even though a range of values are provided for in the syntax, the current SQL Server implementation of the float data type is that if the scale value is from 1 to 24, then the scale is implemented as 24. For any larger value, a scale of 53 is used.
Common Errors A very common error for new developers is to use approximate numeric data types to store values that need to be stored exactly. This causes rounding and processing errors. A "code smell" for picking new developers is to have columns of numbers that do not exactly add up to the displayed totals. It is common to have small rounding errors creep into calculations, that is, a total that is out by 1 cent in dollar or euro based currencies. The inappropriate use of numeric data types can cause processing errors. Look at the following code and decide how many times the PRINT statement would be executed: DECLARE @Counter float; SET @Counter = 0; WHILE (@Counter <> 1.0) BEGIN SET @Counter += 0.1; PRINT @Counter;
Working with Data Types
2-11
END;
It might surprise you that this query would never stop running and would need to be cancelled. After cancelling the query, if you look at the output you would see the following: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 …
What has happened? The problem is that the value 0.1 cannot be stored exactly in a float or real data type. Consider how you would write the answer to 1 / 3 in decimal. The answer isn't 0.3, it's 0.3333333 recurring. There is no way in decimal to write 1 / 3 as an exact decimal fraction. You have to eventually settle for an approximate value. The same problem occurs in binary fractions; it just occurs at different values. 0.1 ends up being stored as the equivalent of 0.099999 recurring. 0.1 in decimal is a non-terminating fraction in binary. So, when you put the system in a loop adding 0.1 each time, the value never exactly equals 1.0, which does happen to be able to be stored precisely.
2-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Date and Time Data Types
Key Points SQL Server supports a rich set of data types for working with date- and time-related values. It is important to be very careful when working with string literal representations of these values and with their precision (or accuracy). SQL Server also provides a large number of functions for working with dates and times.
date and time Data Types The date data type complies with the ANSI SQL standard definition for the Gregorian calendar. The default string format is YYYY-MM-DD. This format is the same as the ISO8601 definition for DATE. date has a range of values from 0001-01-01 to 9999-12-31 with an accuracy of one day. The time data type is aligned to the SQL standard form of hh:mm:ss with optional decimal places up to hh:mm:ss.nnnnnnn. Note that you need to specify the number of decimal places when defining the data type such as time(4). The format that SQL Server uses is similar to the ISO 8601 definition for TIME. The ISO 8601 standard also allow using 24:00:00 to represent midnight and a leap second over 59. These are not supported in the SQL Server implementation. The datetime2 data type is a combination of a date data type and a time data type.
datetime Data Type The datetime data type is an older data type that had a smaller range of allowed dates and a lower precision or accuracy. A common error is to not allow for the 3ms accuracy of the data type. For example, with the datetime data type, executing the following code: DECLARE @When datetime; SET @When = '20101231 23:59:59.999';
would cause the value '20110101 00:00:00.000' to actually be stored.
Working with Data Types
2-13
Another problem with the datetime data type is that the way it converts strings to dates is based on language format settings. A value in the form 'YYYYMMDD' will always be converted to the correct date but a value in the form 'YYYY-MM-DD' might end up being interpreted as 'YYYY-DD-MM' depending on the settings for the session. It is important to understand that this behaviour does not happen with the new date data type, so a string that was in form 'YYYY-MM-DD' could be interpreted as two different dates by the date (and datetime2) data type and the datetime data type.
Timezones The datetimeoffset data type is a combination of a datetime2 data type and a timezone offset. Note that the data type is not timezone aware, it is simply capable of storing and retrieving timezone values. Note that the timezone offset values extend for more than a full day (range of -14:00 to +14:00). A range of system functions has been provided for working with timezone values, as well as with all the date and time related data types. Question: Why is the specification of a date range from the year 0000 to the year 9999 based on the Gregorian Calendar not entirely meaningful?
2-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Unique Identifiers
Key Points Globally unique identifiers (GUIDs) have become common in application development. They are used to provide a mechanism where any process can generate a number at will and know that it will not clash with a number generated by any other process.
GUIDs Numbering systems have traditionally depended on a central source for the next value in a sequence to make sure that no two processes use the same value. GUIDs were introduced to avoid the need for anyone to function as the "number allocator". Any process (and on any system) can generate a value and know that it will not clash with a value generated by any process across time and space and on any system to a very, very high degree of probability. This is achieved by using very, very large values. When discussing the bigint data type earlier, you learned that the 64-bit bigint values were really large. GUIDs are 128-bit values. The magnitude of a 128-bit value is well beyond our capabilities of comprehension.
uniqueidentifier Data Type The uniqueidentifier data type in SQL Server is typically used to store globally unique identifiers. Standard arithmetic operators such as =, <> (or !=), <, >, <=, and >= are supported along with NULL and NOT NULL checks. The IDENTITY property is not used with uniqueidentifier columns. New values are not calculated by code in your process. They are calculated by calling system functions that generate a value for you. In SQL Server, this function is the NEWID() function. The random nature of GUIDs has also caused significant problems in current storage subsystems. SQL Server 2005 introduced the NEWSEQUENTIALID() function to attempt to get around the randomness of the values generated by NEWID(). However, the function does so at the expense of some guarantee of uniqueness.
Working with Data Types
2-15
The usefulness of NEWSEQUENTIALID() is also quite limited as the main reason for using GUIDs is to allow other layers of code to generate the values and know they can just insert them into a database without clashes. If you need to request a value from the database via NEWSEQUENTIALID(), you usually would have been better to use an IDENTITY column instead. A very common development error is to store GUIDs in string values rather than in uniqueidentifier columns. Note: uniqueidentifier columns are also commonly used by replication systems. Replication is an advanced topic beyond the scope of this course. Question: The slide mentions that a common error is to store GUIDs as strings. What would be wrong with this?
2-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
NULL or NOT NULL Columns
Key Points Nullability determines if a value must be present or not. Assigning inappropriate nullability of columns is another very common design error.
NULL NULL is a state that a column is in, rather than a type of value that is stored in a column. You do not say that a value equals NULL; you say that a value is NULL. This is why in T-SQL, you do not check whether a value is NULL with the equality operator. You do not write code that says: WHERE Color = NULL;
Instead, you write code that says: WHERE Color IS NULL;
Common Errors New developers will often confuse NULL values with zero, blank (or space), zero-length strings, etc. This is exasperated by other database engines that treat NULL and zero-length strings or zeroes as identical. NULL indicates the absence of a value. Careful consideration must be given to the nullability of a column. As well as specifying a data type for a column, you specify whether or not a value needs to be present. (Often this is referred to as whether or not a column is mandatory). Look at the NULL and NOT NULL declarations on the slide and decide why each decision might have been made.
Working with Data Types
Question: When should a value be nullable?
2-17
2-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Working with Numeric Data Types
Key Points In this demonstration you will see: •
Work with IDENTITY values
•
Work with NULL
•
Insert GUIDs into a table
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Follow the instructions contained within the comments of the script file.
Working with Data Types
2-19
Lesson 2
Working with Character Data
In the last lesson, you saw that the most basic types of data that get stored in database systems today are numbers, dates, and strings. There are a range of data types that can be used for each of these. You also looked at the available range of data types that can be used for numeric and date-related data. In this lesson, you will now look at the other very common category of data: the string-related data types. Another common class of design and implementation errors relates to collations. Collations define how string data is sorted. In this lesson, you will also see how collations are defined and used.
Objectives After completing this lesson, you will be able to: • • •
Explain the role of Unicode encoding Use character data types Work with collations
2-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Understanding Unicode
Key Points Traditionally, most computer systems stored one character per byte. This only allowed for 256 different character values, which is not enough to store characters from many languages.
Multi-byte Character Issues Consider Asian languages such as Chinese or Japanese that need to store thousands of characters. You may not have ever considered it but how would you type these characters on a keyboard? There are two basic ways that this is accomplished. One option is to have an English-like version of the language that can be used for entry. Japanese does in fact have a language form called Romaji that uses English-like characters for representing words. Chinese has a form called pinyin that is also somewhat English-like. If a Chinese writer enters a pinyin word like "ma" on a keyboard, the following list of options appears:
They can then enter the number beside the character to select the intended word. It might not seem important to an English-speaking person but given that the first option means "horse", the second option is like a question mark, and the third option means "mother", there is definitely a need to select the correct option!
Character Groups An alternate way to enter the characters is via radical groupings. Please note the third character in the screenshot above. The left-hand part of that character, 女, means "woman". Rather than entering Englishlike characters (that could be quite unfamiliar to the writers), select a group of characters based on what is known as a radical. If you select the woman radical, you see this list:
Working with Data Types
2-21
Please note that the character representing "mother" is the first character on the second line. For this sort of keyboard entry to work, the characters must be in appropriate groups, not just stored as one large sea of characters. An additional complexity is that the radicals themselves are also in groups. You can see in the screenshot that the woman radical was part of the third group of radicals.
Unicode In the 1980s, work was done to determine how many bytes are required to be able to hold all characters from all languages but also store them in their correct groupings. The answer was three bytes. You can imagine that three was not an ideal number for computing and at the time users were mostly working with 2 byte (that is, 16 bit) computer systems. Unicode introduced a two-byte character set that attempts to fit the values from the three bytes into two bytes. Inevitably then, trade-offs had to occur. Unicode allows any combination of characters that are drawn from any combination of languages to exist in a single document. There are multiple encodings for Unicode with UTF-7, UTF-8, UTF-16, and UTF-32. (UTF is universal text format). SQL Server currently implements double-byte characters for its Unicode implementation. For string literal values, an N prefix on a string allows the entry of double-byte characters into the string rather than just single-byte characters. (N stands for "National" in "National Character Set"). When working with character strings, the LEN function returns the number of characters (Unicode or not) whereas DATALENGTH returns the number of bytes. Question: Do you recognize either of the phrases on the slide?
2-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Character Data Types
Key Points SQL Server provides a range of string data types for storing characters. They differ by length and by character encoding.
char and nchar Data Types The char and nchar data types are data types that allow you to specify the number of characters that will be stored. It is important to realize that if you specify char(50) then 50 characters will be stored and retrieved. char is for single-byte character sets and nchar is designed for double-byte Unicode characters. When retrieving values from char and nchar data, it is common to need to trim the trailing characters. By default, these characters will be spaces (or blanks). Look at the following code: DECLARE @String1 char(10); DECLARE @String2 char(10); SET @String1 = 'Hello'; SET @String2 = 'There'; SELECT @String1 + @String2;
When executed, it returns: "Hello
There
"
Note the trailing spaces. The char and nchar data types are not very useful for data that varies in length but are ideal for short strings that are always the same length, for example, state codes in the U.S.A.
varchar and nvarchar Data Types The varchar and nvarchar data types are the "varying" equivalents of the char and nchar data types. They are used for strings where a maximum length is specified but where the length varies. Rather than allocating a location of a fixed size and allocating the whole location regardless of the length of the string, these data types incur the overhead of storing the length of the string separately to the string itself. This is
Working with Data Types
2-23
of great benefit when the length of the strings being stored varies and it also avoids the need to trim the right-hand-side of the string in most applications. The varchar and nvarchar data types are limited to 8000 and 4000 characters, respectively. This is roughly what fits in a data page in a SQL Server database.
varchar(max) and nvarchar(max) Data Types It has become common to store even longer string values. The varchar(max) and nvarchar(max) data types are used for this. They each allow up to around 2GB of data to be stored.
text and ntext Data Types The text and ntext data types are older data types that are now deprecated and should not be used for new work. The varchar(max) and nvarchar(max) data types should be used instead.
sysname Data Type You will often see object names in SQL Server referred to as being of sysname data type. sysname is an alias data type that is currently mapped to nvarchar(128). Question: Why would you use the sysname data type rather than the nvarchar(128) data type?
2-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Understanding Collations
Key Points Collations in SQL Server are used to control the code page that is used to store non-Unicode data and the rules that govern how SQL Server sorts and compares character values.
Code Pages It was mentioned earlier that computer systems traditionally stored one byte per character. This allowed for 256 possible values, with a range from 0 to 255. The values from 0 to 31 were reserved for "control characters" such as backspace (character 8) and tab (character 9). Character 32 was allocated for a space and so on, up to the Delete character which was assigned the value 127. For values above 127 though, standards were initially not very clear. It was common to store characters such as line drawing characters or European characters with accents or graves in these codes. In fact, a number of computer systems only used 7 bits to store characters instead of 8 bits. (As an example, the DEC10 system from Digital Equipment Corporation stored 5 characters of 7 bits each per 36bit computer "word". It used the final bit as a parity check bit). Problems did arise when different vendors used the upper characters for different purposes. In the 1970's, it was not uncommon to type a character on your screen and see a different character when that document was printed, as the screen and the printer were using different characters in the values above 127. A number of standard character sets that described what should be in the upper code values did appear. The MSDOS operating system categorized these as "code pages". What a code page really defines is which characters are used for the values from 128 to 255. Both the operating systems and SQL Server support a range of code pages.
Working with Data Types
2-25
Sorting and Comparing Another issue that arises with character sets deals with how string values are sorted or compared. For example, is the value "mcdonald" equal to the value "McDonald"? Does the letter "á" (that is, with an accent) equal the letter "a" (without an accent)? If they are not equal, which is greater or less than the other when you sort them?
SQL Server Collations SQL Server provides a concept of "collations" for dealing with these issues. There are two types of collations: SQL Server collations and Windows collations. SQL Server collations are retained for backward compatibility but you are encouraged to make use of Windows collations instead. SQL Server collations have names that are in the form: SQL_SortRules[_Pref]_CPCodePage_ComparisonStyle
The elements of this are: SQL
The actual string "SQL"
SortRules
A string identifying the alphabet or language that are applied when dictionary sorting is specified
Pref
An optional string that indicates an uppercase preference
CodePage
One to four digits that define the code page used by the collation. For curious historic reasons, CP1 specifies code page 1252 but for all others the number indicates the code page, for example, CP850 specifies code page 850.
ComparisonStyle
Either BIN for binary or a combination of case and accent sensitivity. CI is caseinsensitive, CS is case-sensitive. AI is accent-insensitive, AS is accent-sensitive.
As an example, the collation SQL_Latin1_General_Pref_CP850_CI_AS indicates that it is a SQL collation, Latin1_General is the alphabet being used, there is a preference for upper-case, the code page is 850, and sorting is performed case-insensitive and accent-sensitive. Windows collations have similar naming but with less fields. For example, Windows collation Latin1_General_CI_AS refers to Latin1_General as the alphabet being used, case-insensitive and accentsensitive.
Collation Issues The main issues with collations occur when you try to compare values that are stored with different collations. It is possible to set default collations for servers, databases, and even columns. When comparing values from different collations, you need to then specify which collation (which could yet another collation) will be used for the comparison. Another use of this is as shown in the example in the slide. In this case, you are forcing the query to perform a case-sensitive comparison between the string '%ball%' and the value in the column. If the column contained 'Ball', it would not then match. Question: What are the code page and sensitivity values for the collation QL_Scandinavian_Cp850_CI_AS?
2-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 2A: Working with Character Data
Key Points In this demonstration you will see how to: •
How to work with Unicode and non-Unicode data
•
How to work with collations
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Follow the instructions contained within the comments of the script file.
Working with Data Types
2-27
Lesson 3
Converting Data Types
Now that you have learned about the most common data types, you need to consider that data is not always already in an appropriate data type. For example, you may have received data from another system and you may need to convert the data from one data type to another. You can control how this is done or you can try to let SQL Server do the conversions implicitly. There are a number of issues that can arise when making conversions between data types. You will learn about these issues in this lesson.
Objectives After completing this lesson, you will be able to: • • • •
Use the CAST function Use the CONVERT function Allow implicit data conversion to occur Describe some common issues that arise during conversion
2-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Using CAST
Key Points Available data is not always in the data type that it is needed in. For example, you may need to return a number as a string value. This requires converting the data from one data type to another. The CAST function is used to convert data. CAST is based on the SQL standards.
CAST You can use the CAST function to explicitly convert data from one type to another. Look at the expression: CAST(ListPrice AS varchar(12))
This expression takes the ListPrice column (likely to be a decimal value) and casts it as a string value. Note that you are not exhibiting control over how the decimal value will be formatted as a string, only that it will be converted to a string. An error is returned if the cast is not possible or is not supported. Question: Give an example of a situation where you would need to cast a number as a string.
Working with Data Types
2-29
Using CONVERT
Key Points CAST performs basic type casting but does not allow control over how the type cast will be performed. CONVERT is a SQL Server extension to the SQL language that is more powerful than CAST.
CONVERT While CAST is a good option wherever it can be used as it is a SQL standard option, at times more control is needed on how a conversion is carried out than what CAST allows for. CONVERT allows you to specify the target data type and the source data element but also allows you to specify a style for the conversion. For example, the expression: CONVERT(varchar(8),SYSDATETIME(),112)
would return the current date formatted as YYYYMMDD. Style 112 specifies the format YYYYMMDD. Note that for date-related styles, removing 100 from the value will give you the equivalent style without the century. So the expression: CONVERT(varchar(6),SYSDATETIME(),12)
would return the current date formatted as YYMMDD. Note: The style value is often assumed to just relate to character-based output but it can also be used for determining how an incoming string is parsed.
2-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Implicit Data Conversion
Key Points When data isn't explicitly converted between types, implicit data conversion is attempted by SQL Server automatically. The conversion is based on data type preference.
Implicit Data Conversion Not all types can be converted implicitly to all other types and errors will occur when this is not possible. In general, implicit type conversions should be avoided wherever possible as they can lead to unexpected consequences. For example, what would you expect the output of the following code to be? SELECT 1 / 2;
SQL Server has no need to convert the values here as they are both int values and division is defined for int values. Question: Look at the slide examples. Suggest where implicit conversions are happening and from which data types to which other data types.
Working with Data Types
2-31
Common Conversion Issues
Key Points Data type conversion errors are commonplace. It is important to be aware of common situations that give rise to such errors.
Example Issues Issue
Comment
Inappropriate values for the If the data type target is integer, you are not going to be able to convert target data type most text strings to it. Value is out of range for the Each data type has a range of values that can be stored. For example, you cannot store the value 2340280923 in an int location. target data type Value is truncated while being converted (sometimes silently)
As an example, consider the expression CONVERT(varchar(6),SYSDATETIME(),112). Style 112 suggest returning an 8 character string. When you convert it to a 6 character string, it is silently truncated.
Value is rounded while being converted (sometimes silently)
As an example, the datetime value '20051231 23:59:59.999' is silently rounded to the value '20060101 00:00:00.000'.
Value is changed while being converted (sometimes silently)
As an example, execute the code "SELECT 5ee" and note the output.
Assumptions are made
Even though you might know the internal binary format of a data type
2-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
about internal storage formats for data types
(such as datetime), it is very dangerous to write code that depends on that knowledge. The internal representation could change over time.
Some datetime conversions The string "2010-05-04" could be interpreted as 4th May 2010 or 5th April are deterministic and 2010 depending upon the language settings, when working with the depend on language datetime data type. settings. Some parsing issues are hard to understand.
The SELECT example given for 5ee is also a good example of this.
Note that the worst of these issues tend to occur during implicit type conversions. For this and other reasons, implicit type conversions should be avoided. Attempt to control how conversions occur by making explicit conversions.
Working with Data Types
2-33
Demonstration 3A: Common Conversion Issues
Key Points In this demonstration you will see: •
How to convert date data types explicitly
•
How language settings can affect date conversions
•
How data can be truncated during data type conversion
•
Issues that can arise with implicit conversion
Demonstration Steps 1.
If Demonstration 1A was not performed: • •
• 2. 3.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 31 – Demonstration 3A.sql script file. Follow the instructions contained within the comments of the script file.
2-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 4
Specialized Data Types
You have now covered the common SQL Server data types but SQL Server also includes a number of more specialized data types that it is useful to be aware of. These data types are not used as commonly as the data types that have been described earlier in the module but they fill important roles in development.
Objectives After completing this lesson, you will be able to: • • •
Work with the timestamp and rowversion data types Work with alias data types Describe other SQL Server data types
Working with Data Types
2-35
timestamp and rowversion
Key Points The rowversion data type assists in creating systems that are based on optimistic concurrency. The timestamp data type has been deprecated and replaced by rowversion.
rowversion Data Type Early versions of SQL Server provided the timestamp data type. The naming of this data type was unfortunate as it suggested that the data type had something to do with time, which was not the case. Adding to the confusion, SQL standards made use of a value called TIMESTAMP that was time-related. For these reasons, a synonym was introduced for the timestamp data type, called rowversion. The rowversion should be used for all new work and the timestamp data type is now deprecated.
Implementing Optimistic Concurrency with rowversion The rowversion data type is a special data type that automatically changes to a different value whenever the row that contains a column of this data type is modified. Consider that you are reading a customer's details from your client application. You modified your local copy and wanted to write the changes back to the database. How do you know if the underlying data had already been changed by someone else? You can do this by reading the rowversion value along with the customer's data. Then, when you write the data back, you can include a predicate checking the rowversion in the WHERE clause. If no rows are updated, then you know that someone else had modified the value and you roll back the database change. Note that it isn't enough for you to read the rowversion value and check it before writing your changes. This could lead to a race condition where the data could be modified between checking the value and writing the changes.
2-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
Internal Storage rowversion holds a counter that increments across all changes in the entire database. The current rowversion value for a database can be returned from the system variable @@DBTS.
Working with Data Types
2-37
Alias Data Types
Key Points Alias data types are names given to subtypes of existing system built-in (or intrinsic) types. The use of alias types can help promote consistency in database designs.
Alias Data Types In earlier versions of SQL Server, an alias type was created by calling sp_addtype. Code that uses sp_addtype should be replaced by code using the CREATE TYPE statement. In the example shown in the slide, a data type called ProductNumber has been created as equivalent to nvarchar(20) and NOT NULL. The key advantage of doing this is consistency. It avoids product numbers being created as nvarchar(20) in one part of an application and nvarchar(22) in another part of the application. Another less-obvious advantage of using alias data types is that they can be very useful in automated code-generation. For example, if a PhoneNumber data type is defined as nvarchar(16), an automated code generator that is constructing a default user interface for an application could know that the columns using this data type should be shown in a specific way within the user interface. For example, code could be autogenerated to allow auto-dialling of any phone number. The public database role is automatically granted REFERENCES permission on alias types created this way. Note that this is not the case for other types created by CREATE TYPE.
2-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Other Data Types
Key Points SQL Server also offers a number of special data types. A number of other data types are shown in the table on the slide. These are important but less commonly used data types.
Other Data Types The binary, varbinary, and varbinary(max) data types are used for storing arbitrary binary large objects (blobs) in the database. For example, you may wish to store a music clip, a video clip, or a photo. The image data type is an older SQL Server data type and is now deprecated. You should use the varbinary(max) data type in most cases where you would have used the image data type. The hierarchyid data type was added in SQL Server 2008. It is implemented in managed code (like the geometry and geography spatial data types that will be discussed later in the course). It is used to represent a node in a tree. The sql_variant data type is used to store data of an unknown data type. (It should be rarely used). The xml data type is used to store semi-structured textual data. The xml data type will be covered in Modules 17 and 18. The cursor data type is used to hold a reference to a cursor when constructing cursor-based code. The table data type is used to hold an entire rowset. It will be discussed further in Module 10. The geometry and geography data types are used to store spatial data elements and are discussed in Module 19.
Working with Data Types
2-39
Demonstration 4A: rowversion Data Type
Key Points In this demonstration you will see how: •
How to use the rowversion data type
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 41 – Demonstration 4A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
2-40
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 2: Working with Data Types
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
Working with Data Types
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i.
User name: AdventureWorks\Administrator
ii.
Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
2-41
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_02_PRJ\6232B_02_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario A new developer has sought your assistance in deciding which data types to use for three new tables she is designing. She presents you with a list of organizational data requirements for each table. You need to decide on appropriate data types for each item. You also need to export some data from your existing system but while being exported, some of the columns need to be converted to alternate data types. If you have time, there is another issue that your manager would like you to address. She is concerned about a lack of consistency in the use of data types across the organization. At present, she is concerned about email addresses and phone numbers. You need to review the existing data types being used in the MarketDev database for this and create new data types that can be used in applications, to avoid this inconsistency.
Supporting Documentation Table 1: PhoneCampaign Description Which campaign this relates to. The prospect that was contacted. When contact was first attempted with the prospect. Comments related to the contact that was made, if it was made.
2-42
Implementing a Microsoft® SQL Server® 2008 R2 Database
When contact was actually made with the prospect. Outcome of the contact: sale, later follow-up, or no interest Value of any sale made (up to 2 decimal places)
Table 2: Opportunity Description Name of the opportunity Which prospect this opportunity relates to Stage the sale is at: Lead, Qualification, Proposal Development, Contract Negotiations, Complete, Lost Date that the opportunity was raised Probability of success Rating: Cold, Warm, Hot Estimated closing date Estimated revenue Delivery address
Table 3: SpecialOrder Description Which prospect this order is for External supplier of the item Description of the item Quantity Required (some quantities are whole numbers, some are fractional with up to three decimal places) Date of order Promised delivery date
Working with Data Types
2-43
Actual delivery date Special requirements (any comments related to the special order) Quoted price per unit (up to two decimal places)
Query Requirement 1: A list of products from the Marketing.Product table that are no longer sold, that is they have a SellEndDate. The output should show ProductID, ProductName, and SellEndDate formatted as a string based on the following format: YYYYMMDD. The output should appear similar to:
Query Requirement 2: A list of products from the Marketing.Product table that have demographic information. The output should show ProductID, ProductName, and Demographics formatted as nvarchar(1000) instead of XML. The output should appear similar to:
2-44
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Choosing Appropriate Data Types Scenario In this exercise, a new developer has sought your assistance in deciding which data types to use for three new tables she is designing. She presents you with a list of organizational data requirements for each table. You need to decide on appropriate data types for each item. The main tasks for this exercise are as follows:
1.
Determine column names and data types
Task 1: Determine column names and data types •
Review the supporting documentation for details of the PhoneCampaign, Opportunity, and SpecialOrder tables and determine column names and data types for each data item in the design. Results: After this exercise, you should have determined the columns names and data types for the following tables: PhoneCampaign, Opportunity, and SpecialOrder.
Working with Data Types
2-45
Exercise 2: Writing Queries With Data Type Conversions Scenario In this exercise, you need to export some data from your existing system. While being exported, some of the columns need to be converted to alternate data types. The main tasks for this exercise are as follows: 1.
Connect to the MarketDev Database
2.
Review the first query requirement and write a SELECT statement to meet the requirement.
3.
Review the second query requirement and write a SELECT statement to meet the requirement.
Task 1: Connect to the MarketDev Database •
Open a new query window against the MarketDev database.
Task 2: Review the first query requirement and write a SELECT statement to meet the requirement •
Review the supporting documentation for details for the first query requirement.
•
Write a SELECT statement that returns the required data. The output should look similar to the supplied sample.
Task 3: Review the second query requirement and write a SELECT statement to meet the requirement •
Review the supporting documentation for details for the second query requirement.
•
Write a SELECT statement that returns the required data. The output should look similar to the supplied sample. Results: After this exercise, you should have created two new SELECT statements as per the design requirements.
2-46
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 3: Designing and Creating Alias Data Types (Only if time permits) Scenario In this exercise, your manager is concerned about a lack of consistency in the use of data types across the organization. At present, she is concerned about email addresses and phone numbers. You need to review the existing data types being used in the MarketDev database for this and create new data types that can be used in applications, to avoid this inconsistency. The main tasks for this exercise are as follows: 1.
Investigate the storage of phone numbers and email addresses
2.
Create a data type to be used to store phone numbers
3.
Create a data type to be used to stored email addresses
Task 1: Investigate the storage of phone numbers and email addresses •
Investigate how phone numbers and email addresses have been stored in the MarketDev database.
Task 2: Create a data type that stores phone numbers •
Create a data type to be used to store phone numbers.
Task 3: Create a data type that stores email addresses •
Create a data type to be used to stored email addresses. Results: After this exercise, you should have created two new data types that store phone numbers and email addresses.
Working with Data Types
2-47
Module Review and Takeaways
Review Questions 1. 2.
3.
What is the uniqueidentifier data type commonly used for? What are common errors that can occur during data type conversion? What date is present in a datetime data type if a value is assigned to it that only contains a time?
Best Practices 1. 2. 3. 4. 5.
6.
Always choose an appropriate data type for columns and variables rather than using generic data types such as string or xml except where they are necessary. When defining columns, always specify the nullability rather than leaving it to the system default settings. Avoid the use of any of the deprecated data types. In the majority of situations, do not store currency values in approximate numeric data types such as real or float. Use the unicode-based data types where there is any chance of needing to store non-English characters. Use sysname data type in administrative scripts involving database objects rather than nvarchar(128).
2-48
Implementing a Microsoft® SQL Server® 2008 R2 Database
Designing and Implementing Tables
Module 3 Designing and Implementing Tables Contents: Lesson 1: Designing Tables
3-3
Lesson 2: Working with Schemas
3-15
Lesson 3: Creating and Altering Tables
3-21
Lab 3: Designing and Implementing Tables
3-32
3-1
3-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
In relational database management systems (RDBMS), user and system data is stored in tables. Each table comprises a set of rows that describe entities and a set of columns that hold the attributes of an entity. For example, a Customer table would have columns such as CustomerName and CreditLimit and a row for each customer. In SQL Server, tables are contained within schemas that are very similar in concept to folders that contain files in the operating system. Designing tables is often one of the most important roles undertaken by a database developer because incorrect table design leads to the inability to query the data efficiently. Once an appropriate design has been created, it is then important to know how to correctly implement the design.
Objectives After completing this module, you will be able to: • •
•
Design Tables Work with Schemas Create and Alter Tables
Designing and Implementing Tables
3-3
Lesson 1
Designing Tables
The most important aspect of designing tables involves determining what data each column will hold. As all organizational data is held within database tables, it is critical to store the data with an appropriate structure. The best practices for table and column design are often represented by a set of rules known as "normalization" rules. In this lesson, you will learn the most important aspects of normalized table design along with the appropriate use of primary and foreign keys. In addition, you will learn to work with the system tables that are supplied when SQL Server is installed.
Objectives After completing this lesson, you will be able to: • • • • • •
Describe what a table is Normalize data Describe common Normalization Forms Explain the role of Primary Keys Explain the role of Foreign Keys Work with System tables
3-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Table?
Key Points Relational databases store data about entities in tables that are defined by columns and rows. Rows represent entities and columns define the attributes of the entities. Tables have no predefined order and can be used as a security boundary.
Tables Relational database management systems are not the only type of database system available but they are the most commonly deployed type of database management system at present. In formal relational database management system terminology, tables are referred to as "relations". Tables store data about entities such as customers, suppliers, orders, products, and sales. Each row of a table represents the details of a single entity, for example a single customer, supplier, order, product, or sale. Columns define the information that is being held about each entity. For example, a Product table might have columns such as ProductID, Size, Name, and UnitWeight. Each of these columns is defined with a specific data type. For example, the UnitWeight of a product might be allocated a decimal(18,3) data type.
Naming Conventions Strong disagreement exists in the industry over naming conventions for tables. The use of prefixes (such as tblCustomer or tblProduct) is widely discouraged. Prefixes were widely used in higher-level programming languages before the advent of strong typing (that is the use of strict data types) but are now rare. The main reason for this is that names should represent the entities, not how they are stored. For example, during a maintenance operation, it might become necessary to replace a table with a view or vice-versa. This could lead to views named tblProduct or tblCustomer, when trying to avoid breaking existing code.
Designing and Implementing Tables
3-5
Another area of strong disagreement relates to whether table names should be singular or plural. For example, should a table that holds the details of a customer be called Customer or Customers? Proponents of plural naming argue that the table holds the details of many customers whereas proponents of singular naming argue that it is common to expose these tables via object models in higher-level languages and that the use of plural names complicates this process. SQL Server system tables (and views) have plural names. The argument is not likely to ever be resolved either way and is not a SQL language-specific problem. For example, an array of customers in a higher-level language could sensibly be called "Customers" yet referring to a single customer via "Customers[49]" seems awkward. The most important aspect of naming conventions is that you should adopt a naming convention that you can work with and apply it consistently.
Security Tables can be used as security boundaries in that users can be assigned permissions at the table level. Note also though that SQL Server supports the assignment of permissions at the column level as well as at the table level. Row-level security is not available for tables but can be implemented via a combination of views, stored procedures, and/or triggers.
Row Order Tables are containers for rows but they do not define any order for the rows that they contain. When selecting rows from a table, a user should specify the order that the rows should be returned in, but only if the output order matters. SQL Server may have to expend additional sorting effort to return rows in a given order and it is important that this effort is only expended when necessary.
3-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Normalizing Data
Key Points Normalization is a systematic process that is used to improve the design of databases.
Normalization Edgar F. Codd (August 23, 1923 – April 18, 2003) was a British scientist who is widely regarded as having invented the Relational Model. This model underpins the development of relational database management systems. Codd introduced the concept of normalization and helped the concept evolve over many years, through a series of "normal forms". Codd introduced 1st normal form in 1970, followed by 2nd normal form, and then 3rd normal form in 1971. Since that time, higher forms of normalization have been introduced by theorists but most database designs today are considered to be "normalized" if they are in 3rd normal form.
Intentional Denormalization Not all databases should be normalized. It is common to intentionally denormalize databases for performance reasons or for ease of end-user analysis. As an example, dimensional models that are widely used in data warehouses (such as the data warehouses commonly used with SQL Server Analysis Services) are intentionally designed to not be normalized. Tables might also be denormalized to avoid the need for time-consuming calculations or to minimize physical database design constraints such as locking.
Designing and Implementing Tables
3-7
Common Normalization Forms
Key Points In general, normalizing a database design leads to an improved design. Most common table design errors in database systems can be avoided by applying normalization rules.
Normalization Normalization is used to: •
free the database of modification anomalies
•
minimize redesign when the structure of the database needs to be changed
•
ensure the data model is intuitive to users
•
avoid any bias towards particular forms of querying
While there is disagreement on the interpretation of these rules, general agreement exists on most common symptoms of violating the rules.
1st Normal Form Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key. No repeating groups should exist. For example, a product table should not include columns such as Supplier1, Supplier2, and Supplier3. Column values should not include repeating groups. For example, a column should not contain a comma-separated list of suppliers. Duplicate rows should not exist in tables. Tables should have unique keys. If unique keys are not present, duplicate rows could exist. A candidate key is a column or set of columns that can be used to uniquely identify a row in a table. A more controversial reading of 1st Normal Form rules would disallow the use of nullable columns.
3-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
2nd Normal Form Create separate tables for sets of values that apply to multiple records. Relate these tables with a foreign key. A common error with 2nd Normal Form would be to hold the details of products that a supplier provides in the same table as the details of the supplier's credit history. These values should be stored separately.
3rd Normal Form Eliminate fields that do not depend on the key. Imagine a Sales table with columns OrderNumber, ProductID, ProductName, SalesAmount, and SalesDate. This table would not be in 3rd Normal Form. A candidate key for the table might be the OrderNumber column. The ProductName column only depends on the ProductID column, and not on the candidate key. The Sales table should be separated from a Product table and likely linked to it by ProductID. Formal database terminology is precise but can be hard to follow when first encountered. In the next demonstration, you will see examples of common normalization errors.
Designing and Implementing Tables
3-9
Demonstration 1A: Normalization
Key Points In this demonstration you will see common normalization errors
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Follow the instructions contained within the comments of the script file.
3-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Primary Keys
Key Points A primary key is a form of constraint applied to a table. A candidate key is used to identify a column or set of columns that can be used to uniquely identify a row. A primary key is chosen from any potential candidate keys.
Primary Key A primary key must be unique and cannot be NULL. Primary keys are a form of constraint. (Constraints are discussed later in this course). Consider a table that holds an EmployeeID and a NationalIDNumber, along with the employee's name and personal details. The EmployeeID and the NationalIDNumber are likely to both be possible candidate keys. In this case, the EmployeeID column would be the primary key. It may be necessary to combine multiple columns into a key before it can be used to uniquely identify a row. In formal database terminology, no candidate key is more important than any other candidate key. When tables are correctly normalized though, they will usually have only a single candidate key that could be used as a primary key. However, this is not always the case. Ideally, keys used as primary keys should not change over time.
Natural vs. Surrogate Keys A surrogate key is another form of key that is used as a unique identifier within a table but one which is not derived from "real" data. Natural keys are formed from data within the table. For example, a Customer table may have a CustomerID or a CustomerCode column containing numeric, globally unique identifier (GUID), or alphanumeric codes. The surrogate key would not be related to the other attributes of a customer. The use of surrogate keys is another topic that can lead to strong debate between database professionals.
Designing and Implementing Tables
Question: What is an advantage of using a natural key? Question: What is a disadvantage of using a natural key? Question: What might be an appropriate primary key for the Owner table mentioned in the previous demonstration?
3-11
3-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Foreign Keys
Key Points A Foreign Key is used to establish references or relationships between tables.
Foreign Keys It is a requirement to hold the details of the primary key (or another unique key) from one table as a column in another table. For example, a CustomerOrders table might include a CustomerID column. A foreign key reference is used to ensure that any CustomerID entered into the CustomerOrders table does in fact exist in the Customers table. In SQL Server, the reference is only checked if the column holding the foreign key value is not NULL.
Self-Referencing Tables A table can hold a foreign key reference to itself. For example, an Employees table might contain a ManagerID column. An employee's manager is also an employee. A foreign key reference can be made from the ManagerID column of the Employees table to the EmployeeID column in the same table.
Reference Checking Referenced keys cannot be updated or deleted unless options that cascade the changes to related tables are used. For example, you cannot change the ID for a customer when there are orders in a CustomerOrders table that reference that customer's ID. However, at the time you define the foreign key constraint, you can specify that changes to the referenced value are permitted and will cascade. This means that the ID for the customer would be changed in both the Customers table and in the CustomerOrders table. Tables might also include multiple foreign key references. For example, an Orders table might refer to a Customers table and a Products table.
Designing and Implementing Tables
Terminology Foreign keys are referred to as being used to "enforce referential integrity". Foreign keys are a form of constraint and will be covered in more detail in a later module. The ANSI SQL 2003 definition refers to self-referencing tables as having "recursive foreign keys". Question: What would be an example of multiple foreign keys in a table referencing the same table?
3-13
3-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Working with System Tables
Key Points System Tables are the tables that are provided directly by the SQL Server database engine. They should not be directly modified.
System Tables in Earlier Versions If you have worked with SQL Server 2000 and earlier versions, you might be expecting databases to contain a large number of system tables. Users often modified these system tables (sometimes by accident) and this caused issues when applying service packs and updates. Worse, it could have led to unexpected behavior or failures if the data was not changed correctly. Users also often took dependencies on the format of these system tables. That made it difficult for new versions of SQL Server to have improved designs for these tables while avoiding the chance of breaking existing applications. As an example, when the syslogins table needed to be expanded, a new sysxlogins table was added instead of changing the existing table. In SQL Server 2005, these tables were hidden and replaced by a set of system views that show the contents of the system tables. These views are permission-based and would display data to a user only if the user has appropriate permission to view the data.
System Tables in the msdb Database msdb is the database used by SQL Server Agent, primarily for organizing scheduled background tasks that are known as "jobs". A large number of system tables are still present in the msdb database. Again, while it is acceptable to query these tables, they should not be directly modified. Unless the table is documented, no dependency on its format should be taken when designing applications.
Designing and Implementing Tables
3-15
Lesson 2
Working with Schemas
SQL Server 2005 introduced a change to how schemas are used. Since that version, schemas are used as containers for objects such as tables, views, and stored procedures. Schemas can be particularly helpful in providing a level of organization and structure when large numbers of objects are present in a database. Security permissions can also be assigned at the schema level rather than individually on the objects contained within the schemas. Doing this can greatly simplify the design of system security requirements.
Objectives After completing this lesson, you will be able to: •
Describe the role of a Schema
•
Describe the role of Object Name Resolution
•
Create Schemas
3-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Schema?
Key Points Schemas are used to contain objects and to provide a security boundary for the assignment of permissions.
Schemas In SQL Server, schemas are essentially used as containers for objects, somewhat like a folder is used to hold files at the operating system level. Since their introduction in SQL Server 2005, schemas can be used to contain objects such as tables, stored procedures, functions, types, views, etc. Schemas form a part of the multi-part naming convention for objects. In SQL Server, an object is formerly referred to by a name of the form: Server.Database.Schema.Object
Security Boundary Schemas can be used to simplify the assignment of permissions. An example of applying permissions at the schema level would be to assign the EXECUTE permission on a schema to a user. The user could then execute all stored procedures within the schema. This simplifies the granting of permissions as there is no need to set up individual permissions on each stored procedure.
Upgrading Older Applications If you are upgrading applications from SQL Server 2000 and earlier versions, it is important to understand that the naming convention changed when schemas were introduced. Previously, names were of the form: Server.Database.Owner.Object Objects still have owners but the owner's name does not form a part of the multi-part naming convention from SQL Server 2005 onwards. When upgrading databases from earlier versions, SQL Server will
Designing and Implementing Tables
3-17
automatically create a schema with the same name as existing object owners, so that applications that use multi-part names will continue to work.
3-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Object Name Resolution
Key Points It is important to use at least two-part names when referring to objects in SQL Server code such as stored procedures, functions, and views.
Object Name Resolution When object names are referred to in the code, SQL Server must determine which underlying objects are being referred to. For example, consider the following statement: SELECT ProductID, Name, Size FROM Product;
More than one Product table could exist in separate schemas. When single part names are used, SQL Server must then determine which Product table is being referred to. Most users have default schemas assigned but not all users have these. Default schemas are not assigned to Windows groups or to users based on certificates but they do apply to users created from standard Windows and SQL Server logins. Users without default schemas are considered to have the dbo schema as their default schema. When locating an object, SQL Server will first check the user's default schema. If the object is not found, SQL Server will then check the dbo schema to try to locate the object. It is important to include schema names when referring to objects instead of depending upon schema name resolution, such as in this modified version of the previous statement: SELECT ProductID, Name, Size FROM Production.Product;
Apart from rare situations, using multi-part names leads to more reliable code that does not depend upon default schema settings.
Designing and Implementing Tables
3-19
Creating Schemas
Key Points Schemas are created with the CREATE SCHEMA command. This command can also include the definition of objects to be created within the schema at the time the schema is created.
CREATE SCHEMA Schemas have both names and owners. In the first example shown in the slide, a schema named Reporting is being created. It is owned by the user Terry. While both schemas and the objects contained in the schemas have owners and the owners do not have to be the same, having different owners for schemas and the objects contained within them can lead to complex security issues.
Object Creation at Schema Creation Time Besides creating schemas, the CREATE SCHEMA statement can include options for object creation. While the second example in the slide might appear to be three statements (CREATE SCHEMA, CREATE TABLE, and GRANT), it is in fact a single statement. Both CREATE TABLE and GRANT are options that are being applied to the CREATE SCHEMA statement. Within the newly created KnowledgeBase schema, the Article table is being created and the SELECT permission on the schema is being granted to Salespeople. Statements such as the second CREATE SCHEMA example on the slide can lead to issues if the entire statement is not executed together. Question: What would be different about the outcome of the 2nd statement if the CREATE SCHEMA and the CREATE TABLE parts of the statement were executed separately?
3-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 2A: Schemas
Key Points In this demonstration you will see how to: •
Create a schema
•
Create a schema with an included object
•
Drop a schema
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 21 – Demonstration 2A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Designing and Implementing Tables
3-21
Lesson 3
Creating and Altering Tables
Now that you understand the core concepts surrounding the design of tables, this lesson introduces you to the T-SQL syntax used when defining, modifying, or dropping tables. Temporary tables are a special form of tables that can be used to hold temporary result sets. Computed columns are used to create columns where the value held in the column is automatically calculated, either from expressions involving other columns from the table or from the execution of functions.
Objectives After completing this lesson, you will be able to: •
Create Tables
•
Drop Tables
•
Alter Tables
•
Use Temporary Tables
•
Work with Computed Columns
3-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating Tables
Key Points Tables are created with the CREATE TABLE statement. This statement is also used to define the columns associated with the table and identify constraints such as primary and secondary keys.
CREATE TABLE When creating tables with the CREATE TABLE statement, make sure that you supply both a schema name and a table name. If the schema name is not specified, the table will be created in the default schema of the user executing the statement. This could lead to the creation of scripts that are not robust in that they could generate different schema designs when executed by different users.
Nullability You should specify NULL or NOT NULL for each column in the table. SQL Server has defaults for this and they can be changed via the ANSI_NULL_DEFAULT setting. Scripts should always be designed to be as reliable as possible and specifying nullability in DDL scripts helps improve script reliability.
Primary Key A primary key constraint can be specified beside the name of a column if only a single column is included in the key, or after the list of columns. It must be included after the list of columns when more than one column is included in the key as shown in the following example where the SalesID is only unique for each SalesRegisterID: CREATE TABLE PetStore.SalesReceipt ( SalesRegisterID int NOT NULL, SalesID int NOT NULL, CustomerID int NOT NULL, SalesAmount decimal(18,2) NOT NULL, PRIMARY KEY (SalesRegisterID, SalesID)
Designing and Implementing Tables
3-23
);
Primary keys are constraints and are more fully described along with other constraints in a later module. Question: In the example shown, could the OwnerName column have been used as the primary key instead of a surrogate key?
3-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Dropping Tables
Key Points The DROP TABLE statement is used to drop tables from a database. If a table is referenced by a foreign key constraint, it cannot be dropped.
DROP TABLE When dropping a table, all permissions, constraints, indexes, and triggers that are related to the table are also dropped. Code that references the table (such as code contained within stored procedures, functions, and views) is not dropped. This can lead to "orphaned" code that refers to non-existent objects. SQL Server 2008 introduced a set of dependency views that can be used to locate code that references non-existent objects. The details of both referenced and referencing entities are available from the sys.sql_expression_dependencies view. Referenced and referencing entities are also available separately from the sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities dynamic management views. (Views are discussed later in the course). Question: Why would a reference to a table stop it from being dropped?
Designing and Implementing Tables
3-25
Altering Tables
Key Points Altering a table is useful as permissions on the table are retained along with the data in the table. If you drop and recreate the table, both the permissions on the table and the data in the table are lost. If the table is referenced by a foreign key, it cannot be dropped. However, it can be altered.
ALTER TABLE Tables are modified using the ALTER TABLE statement. This statement can be used to add or drop columns and constraints or to enable or disable constraints and triggers. (Constraints and triggers are discussed in later modules). Note that the syntax for adding and dropping columns is inconsistent. The word COLUMN is required for DROP but not for ADD. In fact, it is not an optional keyword for ADD either. If the word COLUMN is omitted in a DROP, SQL Server assumes that it is a constraint being dropped. In the slide example, the PreferredName column is being added to the PetStore.Owner table. Later, the PreferredName column is being dropped from the PetStore.Owner table. Note the difference in syntax regarding the word COLUMN.
3-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: Working with Tables
Key Points In this demonstration you will see how to: •
Create tables
•
Alter tables
•
Drop tables
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: Why should you ensure that you specify the nullability of a column when designing a table?
Designing and Implementing Tables
3-27
Temporary Tables
Key Points Temporary tables are used to hold temporary result sets within a user's session. They are deleted automatically when they go out of scope. This typically occurs when the code that they were created within, completes or aborts.
Temporary Tables Temporary tables are very similar to other tables, except that they are only visible to the creator and in the same scope (and sub-scopes) within the session. They are automatically deleted when a session ends or when they go out of scope. While temporary tables will be deleted when they go out of scope, they should be explicitly deleted when no longer required. Temporary tables are often created in code using the SELECT INTO statement. A table is created as a temporary table if its name has a pound (#) prefix. A global temporary table is created if the name has a double-pound (##) prefix. Global temporary tables are visible to all users and are not commonly used.
Passing Temporary Tables Temporary tables are also often used to pass rowsets between stored procedures. For example, a temporary table created in a stored procedure is visible to other stored procedures executed from within the first procedure. While this use is possible, it is not considered a good practice in general. This breaks common rules of abstraction for coding and also makes it more difficult to debug or troubleshoot the nested procedures. SQL Server 2008 introduced table-valued parameters (TVPs) that can provide an alternate mechanism for passing tables to stored procedures or functions. (TVPs are discussed later in this course).
3-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
The overuse of temporary tables is a common T-SQL coding error that often leads to performance and resource issues. Extensive use of temporary tables is often an indicator of poor coding techniques, often due to a lack of set-based logic design.
Designing and Implementing Tables
3-29
Demonstration 3B: Temporary Tables
Key Points In this demonstration you will see how to: •
Work with temporary tables
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 32 – Demonstration 3B.sql script file.
3.
Follow the instructions contained within the comments of the script file.
3-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Computed Columns
Key Points Computed columns are columns that are derived from other columns or from the result of executing functions.
Computed Columns Computed columns were introduced in SQL Server 2000. In the example shown in the slide, the YearOfBirth column is calculated by executing the DATEPART function to extract the year from the DateOfBirth column in the same table. In the example shown, you can also see the word PERSISTED added to the definition of the computed column. Persisted computed columns were introduced in SQL Server 2005. A non-persisted computed column is calculated every time a SELECT operation occurs on the column. A persisted computed column is calculated when the data in the row is inserted or updated. The data in the column is then selected like the data in any other column. The core difference between persisted and non-persisted computed columns relates to when the computational performance impact is exerted. Non-persisted computed columns work best for data that is modified regularly but selected rarely. Persisted computed columns work best for data that is modified rarely but selected regularly. In most business systems, data is read much more regularly than it is updated. For this reason, most computed columns would perform best as persisted computed columns.
Designing and Implementing Tables
3-31
Demonstration 3C: Computed Columns
Key Points In this demonstration you will see how to work with computed columns.
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 33 – Demonstration 3C.sql script file.
3.
Follow the instructions contained within the comments of the script file.
3-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 3: Designing and Implementing Tables
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
Designing and Implementing Tables
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i. ii.
3-33
User name: AdventureWorks\Administrator Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_03_PRJ\6232B_03_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario A business analyst from your organization has provided you with a first-pass at a schema design for some new tables being added to the MarketDev database. You need to provide an improved schema design based on good design practices and an appropriate level of normalization. The business analyst was also confused about when data should be nullable. You need to decide about nullability for each column in your improved design. The new tables need to be isolated in their own schema. You need to create the required schema DirectMarketing. The owner of the schema should be dbo. When the schema has been created, if you have available time, you need to create the tables that have been designed.
Supporting Documentation Proposed Schema Table1: Competitor Name
Data Type
CompetitorCode
nvarchar(6)
Name
varchar(30)
Address
varchar(max)
Date_Entered
varchar(10)
Strength_of_competition
varchar(8)
3-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Name
Data Type
Comments
varchar(max)
Table2: TVAdvertisement Name
Data Type
TV_Station
nvarchar(15)
City
nvarchar(25)
CostPerAdvertisement
float
TotalCostOfAllAdvertisements
float
NumberOfAdvertisements
varchar(4)
Date_Of_Advertisement_1
varchar(12)
Time_Of_Advertisement_1
int
Date_Of_Advertisement_2
varchar(12)
Time_Of_Advertisement_2
int
Date_Of_Advertisement_3
varchar(12)
Time_Of_Advertisement_3
int
Date_Of_Advertisement_4
varchar(12)
Time_Of_Advertisement_4
int
Date_Of_Advertisement_5
varchar(12)
Time_Of_Advertisement_5
int
Table3: Campaign_Response Name
Data Type
ResponseOccurredWhen
datetime
RelevantProspect
int
RespondedHow
varchar(8) (phone, email, fax, letter)
ChargeFromReferrer
float
RevenueFromResponse
float
Designing and Implementing Tables
Name
Data Type
ResponseProfit
float (revenue less charge)
3-35
3-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Improve the Design of Tables Scenario A business analyst from your organization has provided you with a first-pass at a schema design for some new tables being added to the MarketDev database. You need to provide an improved schema design based on good design practices and an appropriate level of normalization. The business analyst was also confused about when data should be nullable. You need to decide about nullability for each column in your improved design. The main tasks for this exercise are as follows: 1. 2.
Review the supplied design Suggest an improved design
Task 1: Review the supplied design •
Review the supplied design in the supporting documentation for the exercise
Task 2: Suggest an improved design •
Provide recommendations on how to improve the schema design Results: After this exercise, you have provided an improved schema design based on good design practices.
Designing and Implementing Tables
Exercise 2: Create a Schema Scenario The new tables need to be isolated in their own schema. You need to create the required schema DirectMarketing. The owner of the schema should be dbo. The main tasks for this exercise are as follows: 1. 2.
Connect to the MarketDev Database Create a schema named DirectMarketing
Task 1: Connect to the MarketDev Database •
Connect to the MarketDev Database
Task 2: Create a schema named DirectMarketing •
Create a schema named DirectMarketing with dbo as the owner Results: After this exercise, you should have created a new Direct Marketing schema.
3-37
3-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 3: Create the Tables (Only if time permits) Scenario You need to create the tables that have been designed. The main tasks for this exercise are as follows:
1.
Create the tables that you designed in Exercise 1.
Task 1: Create the tables •
Create the tables that were designed in Exercise 1. Take into consideration the nullability of each column and each table should have a primary key. At this point there is no need to create CHECK or FOREIGN KEY constraints. Results: After this exercise, you should have created the tables that were designed in Exercise 1.
Designing and Implementing Tables
3-39
Module Review and Takeaways
Review Questions 1.
What is a primary key?
2.
What is a foreign key?
3.
What is meant by the term "referential integrity"?
Best Practices 1.
All tables should have primary keys.
2.
Foreign keys should be declared within the database in almost all circumstances. Often developers will suggest that the application will ensure referential integrity. Experience shows that this is a poor option. Databases are often accessed by multiple applications. Bugs are also easy to miss when they first start to occur.
3-40
Implementing a Microsoft® SQL Server® 2008 R2 Database
Designing and Implementing Views
Module 4 Designing and Implementing Views Contents: Lesson 1: Introduction to Views
4-3
Lesson 2: Creating and Managing Views
4-13
Lesson 3: Performance Considerations for Views
4-22
Lab 4: Designing and Implementing Views
4-27
4-1
4-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
Views are a type of virtual table because the result set of a view is not usually saved in the database. Views can simplify the design of database applications by abstracting the complexity of the underlying objects. Views can also provide a layer of security. Users can be given permission to access a view without permission to access the objects that the view is constructed on.
Objectives After completing this lesson, you will be able to: •
Explain the role of views in database development
•
Implement views
•
Describe the performance related impacts of views
Designing and Implementing Views
4-3
Lesson 1
Introduction to Views
In this lesson, you will gain an understanding of views and how they are used. You will also investigate the system views that are supplied by the SQL Server engine. A view is effectively a named SELECT query. Unlike ordinary tables (base tables) in a relational database, a view is not part of the physical schema — it is a dynamic, virtual table computed or collated from data in the database. Effective use of views in database system design helps improve performance and manageability. In this module you will learn about views, the different types of views, and how to use them.
Objectives After completing this lesson, you will be able to: •
Describe views
•
Describe the different types of view provided by SQL Server
•
Explain the advantages offered by views
•
Work with system views
•
Work with dynamic management views
4-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a View?
Key Points A view can be thought of as a named virtual table that is defined through a SELECT statement. To an application, a view behaves very similarly to a table. Question: Have you ever used views in designing Microsoft® SQL Server® database systems? If so, why did you use them?
Views The data accessible through a view is not stored in the database as a distinct object, except in the case of indexed views. (Indexed views are described later in this module). What is stored in the database is the SELECT statement. The data tables referenced by the SELECT statement are known as the base tables for the view. As well as being based on tables, views can reference other views. Queries against views are written the same way that queries are written against tables.
Filtering Via Views Views can filter the base tables vertically, horizontally or in both ways. Vertical filtering is used to limit the columns returned by the view. For example, consider a drop-down list of employee names that is displayed in the user interface of an application. While this data could be retrieved from the Employee table, many of the columns in the Employee table might be private and should not be returned to all users. An EmployeeLookup view could be provided to return only the columns that general users are permitted to view. Horiztonal filtering is used to limit the rows returned by the view. For example, a Sales table might hold details of the sales for the entire organization. Sales staff might only be permitted to view sales for their
Designing and Implementing Views
4-5
own region or state. A view could be created that limits the rows returned to those for a particular state or region. Question: Why would you limit which columns are returned by a view?
4-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Types of Views
Key Points There are four basic types of view: standard views, system views (including dynamic management views), indexed views and partitioned views (including distributed partitioned views).
Standard Views Standard views combine data from one or more base tables (or views) into a new virtual table. From the base tables (or views), particular columns and rows can be returned. Any computations, such as joins or aggregations, are done during query execution for each query referencing the view, if the view is not indexed.
System Views System views are provided with SQL Server and show details of the system catalog or aspects of the state of SQL Server. Dynamic Management Views (DMVs) were introduced in SQL Server 2005 and enhanced in every edition since. DMVs provide dynamic information about the state of SQL Server such as information about the current sessions or the queries those sessions are executing.
Indexed Views Indexed views materialize the view through the creation of a clustered index on the view. This is usually done to improve query performance. Complex joins or lengthy aggregations can be avoided at execution time by pre-calculating the results. Indexed views are discussed in Module 6.
Partitioned Views
Designing and Implementing Views
4-7
Partitioned views form a union of data from multiple tables into a single view. Distributed partitioned views are formed when the tables that are being combined by a union operation, are located on separate instances of SQL Server. Note: Indexed views and Partitioned views are described later in this module. Question: What advantages would you assume that views would provide?
4-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
Advantages of Views
Key Points Views are generally used to focus, simplify, and customize the perception each user has of the database.
Advantages of Views Views can provide a layer of abstraction in database development. They can allow a user to focus on a subset of data that is relevant to them, or that they are permitted to work with. Users do not need to deal with the complex queries that might be involved within the view. They are able to query the view as they would query a table. Views can be used as security mechanisms by allowing users to access data through the view, without granting the users permissions to directly access the underlying base tables of the view. Many external applications are unable to execute stored procedures or T-SQL code but can select data from tables or views. Creating a view allows isolating the data that is needed for these export functions. Views can be used to provide a backward compatible interface to emulate a table that previously existed but whose schema has changed. For example, if an Customer table has been split into two tables: CustomerGeneral and CustomerCredit, a Customer view could be created over the two new tables to allow existing applications to query the data without requiring the applications to be altered. Reporting applications often need to execute complex queries to retrieve the report data. Rather than embedding this logic in the reporting application, a view could be created to supply the data required by the reporting application, in a much simpler format. Question: If tables can be replaced by views (and vice-versa) during maintenance, what does that suggest to you about the naming of views and tables?
Designing and Implementing Views
4-9
Working with System Views
Key Points SQL Server provides information about its configuration via a series of system views. These views also provide metadata describing both the objects you create in the database and those provided by SQL Server.
System Views Catalog views are primarily used to retrieve metadata about tables and other objects in databases. While it would be possible to retrieve much of this information directly from system tables, the use of catalog views is the supported mechanism for doing this. Earlier versions of SQL Server provided a set of virtual tables that were exposed as system views. For backwards compatibility, a set of "compatibility" views have been provided. These views, however, are deprecated and should not be used for new development work. The International Standards Organization (ISO) has standards for the SQL language. As each database engine vendor uses different methods of storing and accessing metadata, a standard mechanism was designed. This interface is provided by the views in the INFORMATION_SCHEMA schema. The most commonly used INFORMATION_SCHEMA views are shown in the following table: Common INFORMATION_SCHEMA Views INFORMATION_SCHEMA.CHECK_CONSTRAINTS INFORMATION_SCHEMA.COLUMNS INFORMATION_SCHEMA.PARAMETERS INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS INFORMATION_SCHEMA.ROUTINE_COLUMNS INFORMATION_SCHEMA.ROUTINES INFORMATION_SCHEMA.TABLE_CONSTRAINTS INFORMATION_SCHEMA.TABLE_PRIVILEGES INFORMATION_SCHEMA.TABLES INFORMATION_SCHEMA.VIEW_COLUMN_USAGE
4-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Common INFORMATION_SCHEMA Views INFORMATION_SCHEMA.VIEW_TABLE_USAGE INFORMATION_SCHEMA.VIEWS Question: Give an example of why you would want to interrogate a catalog view.
Designing and Implementing Views
4-11
Dynamic Management Views
Key Points Dynamic Management Views are commonly just called DMVs. These views provide a relational method for querying the internal state of a SQL Server instance.
Dynamic Management Views SQL Server 2005 introduced the concept of Dynamic Management Objects (DMOs). These objects include Dynamic Management Views (DMVs) and Dynamic Management Functions (DMFs). Each of the objects is used to return internal state information from SQL Server. Many of the objects provide very detailed information about the internal operation of SQL Server. DOMs are prefixed by sys.dm_*. The difference between the DMVs and the DMFs is that DMFs have parameters passed to them. You can see the list of current DMVs by looking down the list of System Views within Object Explorer in SQL Server Management Studio. Similarly, you can see the list of current DMFs by looking down the list of System Functions within Object Explorer. DMOs can be used to view and monitor the internal health and performance of a server along with aspects of its configuration. They also have an important role in assisting with troubleshooting problems (such as blocking issues) and with performance tuning. Question: What sort of information about how SQL Server is performing and its health would it be useful to have easy access to?
4-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: System and Dynamic Management Views
Key Points In this demonstration you will see how to: •
Query system views
•
Query dynamic management views
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_04_PRJ\6232B_04_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Follow the instructions contained within the comments of the script file.
Question: When are the values returned by most dynamic management views reset?
Designing and Implementing Views
4-13
Lesson 2
Creating and Managing Views
In the previous lesson, you learned about the role of views. In this lesson you will learn how to create, drop and alter views. You will also learn how views and the objects that they are based on have owners and how this can impact the use of views. You will see how to find information about existing views and how to obfuscate the definitions of views.
Objectives After completing this lesson, you will be able to: •
Create views
•
Drop views
•
Alter views
•
Explain the concept of ownership chaining and how it applies to views
•
List the available sources of information about views
•
Work with updatable views
•
Obfuscate view definitions
4-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating Views
Key Points To create a view you must be granted permission to do so by the database owner. Creating a view involves associating a name with a SELECT statement.
CREATE VIEW If you specify the option WITH SCHEMABINDING, the underlying tables cannot be changed in a way that would affect the view definition. If you later decide to index the view, the WITH SCHEMABINDING option must be used. Views can be based on other views (instead of base tables) up to 32 levels of nesting. Care should be exercised in nesting views deeply as it can become difficult to understand the complexity of the underlying code and it can be difficult to troubleshoot performance problems related to the views. Views have no natural output order. Queries that access the views should specify the order for the returned rows. The ORDER BY clause can be used in a view but only to satisfy the needs of a clause such as the TOP clause. Expressions that are returned as columns need to be aliased. It is also common to define column aliases in the SELECT statement within the view definition but a column list can also be provided after the name of the view. You can see this in the following code example: CREATE VIEW HumanResources.EmployeeList (EmployeeID, FamilyName, GivenName) AS SELECT EmployeeID, LastName, FirstName FROM HumanResources.Employee;
Question: Why is the ORDER BY clause ever permitted in a view definition if it doesn’t impact the output order of the rows?
Designing and Implementing Views
4-15
Dropping Views
Key Points Dropping a view removes the definition of the view and all permissions associated with the view.
DROP VIEW Even if a view is recreated with exactly the same name as a view that has been dropped, permissions formerly associated with the view are removed. It is important to record why views are created and to then drop them if they are no longer required for the purpose they were created. Retaining view definitions that are not in use adds to the work required when reorganizing the structure of databases. If a view was created with the SCHEMABINDING option, it will need to be removed before changes can be made to the structure of the underlying tables. The DROP VIEW statement supports the dropping of multiple views via a comma-delimited list as shown in the following code sample: DROP VIEW Sales.WASales, Sales.CTSales, Sales.CASales;
4-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Altering Views
Key Points After a view is defined, you can modify its definition without dropping and re-creating the view.
ALTER VIEW The ALTER VIEW statement modifies a previously created view. (This includes indexed views that are discussed in the next lesson). The main advantage of using ALTER VIEW is that any permissions associated with the view are retained. Altering a view also involves less code than dropping and recreating a view.
Designing and Implementing Views
4-17
Ownership Chains and Views
Key Points When querying a view, there needs to be an unbroken chain of ownership from the view to the underlying tables unless the user executing the query also has permissions on the underlying table or tables.
Ownership Chaining One of the key reasons for using views is to provide a layer of security abstraction so that access is given to views and not to the underlying table or tables. For this mechanism to function correctly, an unbroken ownership chain must exist. In the example shown in the slide, a user John has no access to a table that is owned by Nupur. If Nupor creates a view or stored procedure that accesses the table and gives John permission to the view, John can then access the view and though it, the data in the underlying table. However, if Nupur creates a view or stored procedure that accesses a table owned by Tim and grants John access to the view or stored procedure, John would not be able to use the view or stored procedure, even if Nupur has access to Tim's table because of the broken ownership chain. Two options could be used to correct this situation: •
Tim could own the view or stored procedure instead of Nupur.
•
John could be granted permission to the underlying table. (This is often undesirable).
Ownership Chains vs. Schemas SQL Server 2005 introduced the concept of schemas. At that point, the two part naming for objects changed from owner.object to schema.object. There seems to be a widespread misunderstanding that since that time, objects no longer have owners. This is not true. Objects still have owners. Even schemas have owners. The configuration of security is simplified if schema owners also own objects that are contained in the schemas.
4-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Sources of Information About Views
Key Points Views are queried the same way that ordinary tables are queried. You may also though want to find out information about how a view is defined or about its properties.
Sources of Information About Views You may need to see the definition of the view to understand how its data is derived from the source tables or to see the data defined by the view. SQL Server Management Studio (SSMS) provides access to a list of views in Object Explorer. This includes both system views and user-created views. By expanding the view nodes in Object Explorer, details of the columns, triggers, indexes and statistics defined on the views can be seen. In Transact-SQL (T-SQL), the list of views in a database can be obtained by querying the sys.views view. In earlier versions of SQL Server, object definitions (including the definitions of unencrypted views) would be located by executing the sp_helptext system stored procedure. The OBJECT_DEFINITION() function allows querying the definition of an object in a relational format. The output of the function is easier to consume in an application than the output of a system stored procedure such as sp_helptext. If you change the name of an object referenced by a view, you must modify the view so that its text reflects the new name. Therefore, before renaming an object, display the dependencies of the object first to determine if any views are affected by the proposed change. Overall dependencies can be found by querying the sys.sql_expression_dependencies view. Column-level dependencies can be found by querying the sys.dm_sql_referenced_entities view.
Designing and Implementing Views
4-19
Updatable Views
Key Points It is possible to update data in the base tables by updating a view.
Updatable Views Updates that are performed on views cannot affect columns from more than one base table. (To work around this restriction, you can create INSTEAD OF triggers. These triggers are discussed in Module 15). While views can contain aggregated values from the base tables, these columns cannot be updated nor can columns involved in grouping operations such as GROUP BY, HAVING or DISTINCT. It is possible to modify a row in a view in such a way that the row would no longer belong to the view. For example, you could have a view that selected rows where the State column contained the value WA. You could then update the row and set the State column to the value CA. To avoid the chance of this, you can specify the WITH CHECK OPTION clause when defining the view. It will check during data modifications that any row modified would still be returned by the same view. Data that is modified in a base table via a view still needs to meet the restrictions on those columns such as nullability, constraints, defaults, etc. as if the base table was modified directly. This can be particularly challenging if all the columns in the base table are not present in the view. For example, an INSERT on the view would fail if the base table it was based upon required mandatory columns that were not exposed in the view and that did not have DEFAULT values.
4-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Obfuscating View Definitions
Key Points Database developers often want to protect the definitions of their database objects. The WITH ENCRYPTION clause can be included when defining or altering a view.
WITH ENCRYPTION The WITH ENCRYPTION clause provides limited obfuscation of the definition of a view. It is important to keep copies of the source code for views. This is even more important when the view is created with the WITH ENCRYPTION option. Encrypted code (including the code definitions of views) makes it harder to perform problem diagnosis and query tracing and tuning. The encryption provided is not very strong. Many 3rd party utilities exist that can decrypt the source so you should not depend on this to protect your intellectual property if doing so is critical to you. Question: Do you think you might be deploying encrypted views in your organization?
Designing and Implementing Views
4-21
Demonstration 2A: Implementing Views
Key Points In this demonstration you will see: •
Create a view
•
Query a view
•
Query the definition of a view
•
Use the WITH ENCRYPTION option
•
Drop a view
•
Generate a script for an existing view
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_04_PRJ\6232B_04_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 21 – Demonstration 2A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: Why is the ability to script a view useful?
4-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Performance Considerations for Views
Now that you understand why views are important and know how to create them, it is important to understand the potential performance impacts of using views. In this lesson, you will see how views are incorporated directly into the execution plans of queries that they are used in. You will see the effect and potential disadvantages of nesting views and see how performance can be improved in some situations. Finally, you will see how the data from multiple tables can be combined into a single view, even if those tables are on different servers.
Objectives After completing this lesson, you will be able to: •
Explain the dynamic resolution process for views
•
List the most important considerations when working with nested views
•
Create indexed views
•
Describe the purpose of partitioned views
Designing and Implementing Views
4-23
Views and Dynamic Resolution
Key Points Standard views are expanded and incorporated into the queries that they are referenced in. The objects that they reference are resolved at execution time.
Views and Dynamic Resolution A single query plan is created that covers the query being executed and the definition of any views that it accesses. A separate query plan for the view is not created. Merging the view query into the outer query is called "inlining" the query. It can be very benefical to performance as SQL Server can eliminate unnecessary joins and table accesses from queries. Standard views do not appear in execution plans for queries, as the views are not accessed. The underlying objects that they reference will be seen in the execution plans. The use of SELECT * in a view definition should be avoided. As an example, you will notice that if you add a new column to the base table that the view will not reflect the column until the view has been refreshed. You can correct this situation by executing an updated ALTER VIEW statement or by calling the sp_refreshview system stored procedure. Question: Suggest a type of join that could easily be eliminated when views are resolved?
4-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Nested View Considerations
Key Points While views can reference other views, careful consideration needs to be made when doing this.
Nested Views Views can be nested up to 32 levels. Layers of abstraction are often regarded as desirable when designing code in any programming language. Views can help provide this. The biggest concern with nested views is that it is very easy to create code that is very difficult for the query optimizer to work with, without realizing that this is occurring. Nested views can make it much harder to troubleshoot performance problems and more difficult to understand where complexity is arising in code.
Designing and Implementing Views
4-25
Partitioned Views
Key Points Partitioned views allow the data in a large table to be split into smaller member tables. The data is partitioned between the member tables based on ranges of data values in one of the columns.
Partitioned Views Data ranges for each member table in a partitioned view are defined in a CHECK constraint specified on the partitioning column. A UNION ALL statement is used to combine selects of all the member tables into a single result set. In a local partitioned view, all participating tables and the view reside on the same instance of SQL Server. In most cases, table partitioning should be used instead of local partitioned views. In a distributed partitioned view, at least one of the participating tables resides on a different (remote) server. Distributed partitioned views can be used to implement a federation of database servers. Good planning and testing is crucial as major performance problems can arise if the design of the partitioned views is not appropriate.
4-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: Views and Performance
Key Points In this demonstration you will see how: •
Views are eliminated in query plans
•
Views are expanded and integrated into the outer query before being optimized
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_04_PRJ\6232B_04_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Designing and Implementing Views
4-27
Lab 4: Designing and Implementing Views
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
4-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_04_PRJ\6232B_04_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario A new web-based stock promotion system is being rolled out. Your manager is very concerned about providing access from the web-based system directly to the tables in your database. She has requested you to design some views that the web-based system could connect to instead. Details of organizational contacts are held in a number of tables. The relationship management system being used by the account management team needs to be able to gain access to these contacts. However, they need a single view that comprises all contacts. You need to design, implement and test the required view. Finally, if you have time, a request has been received from the new Marketing team that the catalog description of the product models should be added to the AvailableModels view. They would appreciate you modifying the view to provide this additional column.
Supporting Documentation View1: OnlineProducts ViewColumn
SourceColumn
ProductID
ProductID
ProductName
ProductName
ProductNumber
ProductNumber
Color
Color (note ‘N/A’ should be returned when NULL)
Availability
Based on DaysToManufacture (0 = Instock, 1 = Overnight, 2 = Fast, Other Values = Call)
Designing and Implementing Views
ViewColumn
SourceColumn
Size
Size
UnitOfMeasure
SizeUnitMeasureCode
Price
ListPrice
Weight
Weight
4-29
Note: Based on table Marketing.Product. Rows should only appear if the product has begun to be sold and is still being sold. (Derive this from SellStartDate and SellEndDate). View2: AvailableModels ViewColumn
SourceColumn
ProductID
ProductID
ProductName
ProductName
ProductModelID
ProductModelID
ProductModel
ProductModel
Based on tables Marketing.Product and Marketing.ProductModel. Rows should only appear if the product has at least one model, has begun to be sold and is still being sold. (Derive this from SellStartDate and SellEndDate). View3: Contacts ViewColumn
SourceColumn in Prospect
Source Column in Salesperson
ContactID
ProspectID
SalespersonID
FirstName
FirstName
FirstName
MiddleName
MiddleName
MiddleName
LastName
LastName
LastName
ContactRole
‘PROSPECT’
‘SALESPERSON’
Based on tables Marketing.Prospect and Marketing.Salesperson tables.
4-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Design, Implement and Test the WebStock Views Scenario A new web-based stock promotion system is being rolled. Your manager is very concerned about providing access from the web-based system directly to the tables in your database. She has requested you to design some views that the web-based system could connect to instead. The main tasks for this exercise are as follows: 1.
Create the WebStock schema
2.
Review the design requirements
3.
Design and implement the views
4.
Test the views
Task 1: Create the WebStock schema •
In the MarketDev database create a new schema named WebStock with dbo as the owner.
Task 2: Review the design requirements •
You have been provided with the design requirements for the OnlineProducts and AvailableModels views. Review these requirements.
Task 3: Design and implement the views •
Design and implement the views
Task 4: Test the views •
Query both views to ensure they return the required data Results: After this exercise, you should have created two new views: OnlineProducts and AvailableModels, both within the WebStock schema.
Designing and Implementing Views
4-31
Exercise 2: Design and Implement the Contacts View Scenario Details of organizational contacts are held in a number of tables. The relationship management system being used by the account management team needs to be able to gain access to these contacts. However, they need a single view that comprises all contacts. You need to design, implement and test the required view. 1.
Create the Relationship schema
2.
Review the design requirements
3.
Design and implement the view
4.
Test the view
Task 1: Create the Relationship schema •
In the MarketDev database create a new schema named Relationship with dbo as the owner.
Task 2: Review the design requirements •
You have been provided with the design requirements for the Contacts view. Review these requirements.
Task 3: Design and implement the view •
Design and implement the view
Task 4: Test the view •
Query the view to ensure it returns the required data Results: After this exercise, you should have created a new Contacts view within a new Relationship schema.
4-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 3: Modify the AvailableModels View (Only if time permits) Scenario A request has been received from the new Marketing team that the catalog description of the product models should be added to the AvailableModels view. You need to now modify the view to provide this additional column. The new column should be called CatalogDescription and should be taken from the ProductDescription table. Multiple descriptions can exist for each model. If an English description exists (based on the LanguageID ‘en’), it should be returned. If no English description exists, the invariant language description (based on a blank string for LanguageID) should be returned. If no descriptions exist, the column should be null. 1.
Alter the AvailableModels view to add the CatalogDescription column.
2.
Test the view
Task 1: Alter the AvailableModels View •
Use the ALTER VIEW statement to change the view to suit the new requirements as described in the Exercise 3 scenario above.
Task 2: Test the view •
Query the view to ensure it now returns the required data Results: After this exercise, you should have modified the AvailableModels view and it should return the new CatalogDescription column.
Designing and Implementing Views
Module Review and Takeaways
Review Questions 1.
How does SQL Server store the view in the database?
2.
What is a Standard non-indexed view?
3.
What is an unbroken ownership chain?
Best Practices 1.
Use views to focus data for users.
2.
Avoid nesting many layers within views.
3.
Avoid ownership chain problems within views.
4.
Ensure consistent connection SET options when intending to index views.
4-33
4-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Planning for SQL Server 2008 R2 Indexing
Module 5 Planning for SQL Server 2008 R2 Indexing Contents: Lesson 1: Core Indexing Concepts
5-3
Lesson 2: Data Types and Indexes
5-11
Lesson 3: Single Column and Composite Indexes
5-19
Lab 5: Planning for SQL Server Indexing
5-24
5-1
5-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
An index is a collection of pages associated with a table. Indexes are used to improve the performance of queries or enforce uniqueness. Before learning to implement indexes, it is important to understand how they work, how effective different data types are when used within indexes, and how indexes can be constructed from multiple columns.
Objectives After completing this lesson, you will be able to: •
Explain core indexing concepts
•
Describe the effectiveness of each data type common used in indexes
•
Plan for single column and composite indexes
Planning for SQL Server 2008 R2 Indexing
5-3
Lesson 1
Core Indexing Concepts
While it is possible for SQL Server to read all the pages in a table when calculating the results of a query, doing so is often highly inefficient. Indexes can be used to point to the location of required data and to minimize the need for scanning entire tables. In this lesson, you will learn how indexes are structured and learn the key measures associated with the design of indexes. Finally, you will see how indexes can become fragmented over time.
Objectives After completing this lesson, you will be able to: •
Describe how SQL Server accesses data
•
Describe the need for indexes
•
Explain the concept of B-Tree index structures
•
Explain the concepts of index selectivity, density and depth
•
Explain why index fragmentation occurs
5-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
How SQL Server Accesses Data
Key Points SQL Server can access data in a table by reading all the pages of the table (known as a table scan) or by using index pages to locate the required rows.
Indexes Whenever SQL Server needs to access data in a table, it makes a decision about whether to read all the pages of the table or whether there are one or more indexes on the table that would reduce the amount of effort required in locating the required rows. Queries can always be resolved by reading the underlying table data. Indexes are not required but accessing data by reading large numbers of pages is usually considerably slower than methods that use appropriate indexes. On occasions, SQL Server will create its own temporary indexes to improve query performance. However, doing so is up to the optimizer and beyond the control of the database administrator or programmer, so these temporary indexes will not be discussed in this module. The temporary indexes are only used to improve a query plan, if no proper indexing already exists. In this module, you will consider standard indexes created on tables. SQL Server includes other types of index: •
Integrated full-text search is a special type of index that provides flexible searching of text.
•
Spatial indexes are used with the GEOMETRY and GEOGRAPHY data types.
•
Primary and secondary XML indexes assist when querying XML data.
Each of these other index types is discussed in later modules in this course. Question: When might a table scan be more efficient than using an index?
Planning for SQL Server 2008 R2 Indexing
5-5
The Need for Indexes
Key Points Indexes are not described in ANSI SQL definitions. Indexes are considered to be an implementation detail. SQL Server uses indexes for improving the performance of queries and for implementing certain constraints.
The Need for Indexes As mentioned in the last topic, SQL Server can always read the entire table to work out required results but doing so can be inefficient. Indexes can reduce the effort required to locate results but only if the indexes are well-designed. SQL Server also uses indexes as part of its implementation of primary key and unique constraints. When you assign a primary key or unique constraint to a column or set of columns, SQL Server automatically indexes that column or set of columns. It does so to make it fast to check whether or not a given value is already present.
A Useful Analogy At this point, it is useful to consider an analogy that might be easier to relate to. Consider a physical library. Most libraries store books in a given order, which is basically an alphabetical order within a set of defined categories. Note that even when you store the books in alphabetical order, there are various ways that this could be done. The order of the books could be based on the name of the book or the name of the author. Whichever option is chosen makes one form of access easy and does not help other methods of access. For example, if books were stored in book name order, how would you locate books written by a particular author? Indexes assist with this type of problem. Question: Which different ways might you want to locate books in a physical library?
5-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Index Structures
Key Points Tree structures are well known for providing rapid search capabilities for large numbers of entries in a list.
Index Structures Indexes in database systems are often based on binary tree (B-Tree) structures. Binary trees are simple structures where at each level, a decision is made to navigate left or right. This style of tree can quickly become unbalanced and less useful. SQL Server indexes are based on a form of self-balancing tree. Whereas binary trees have at most two children per node, SQL Server indexes can have a large number of children per node. This helps improve the efficiency of the indexes and avoids the need for excessive depth within an index. Depth is defined as the number of levels from the top node (called the root node) and the bottom nodes (called leaf nodes).
Planning for SQL Server 2008 R2 Indexing
5-7
Selectivity, Density and Index Depth
Key Points When designing indexes, three core concepts are important: selectivity, density and index depth.
Selectivity, Density and Index Depth Additional indexes on a table are most useful when they are highly selective. For example, imagine how you would locate books by a specific author in a physical library using a card file index. This would a process such as: •
Find the first entry for the author in the index.
•
Locate the book in the bookcases based on the information in the index entry.
•
Return to the index and find the next entry for the author.
•
Locate the book in the bookcases based on the information in that next index entry.
•
And so on.
Now imagine doing the same for a range of authors such as one third of all the authors. You quickly reach a point where it would be quicker to just scan the whole library and ignore the author index rather than running backwards and forwards between the index and the bookcases. Density is a measure of the lack of uniqueness of the data in a table. A dense table is one that has a high number of duplicates. Index Depth is a measure of the number of levels from the root node to the leaf nodes. Users often imagine that SQL Server indexes are quite deep but the reality is quite different to this. The large number of children that each node in the index can have produces a very flat index structure. Indexes with only 3 or 4 layers are very common.
5-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
Index Fragmentation
Key Points Index fragmentation is the inefficient use of pages within an index. Fragmentation occurs over time as data is modified.
Index Fragmentation For operations that read data, indexes perform best when each page of the index is as full as possible. While indexes may initially start full (or relatively full), modifications to the data in the indexes can cause the need to split index pages. From our physical library analogy, imagine a fully populated library with full bookcases. What occurs when a new book needs to be added? If the book is added to the end of the library, the process is easy but if the book needs to be added in the middle of a full bookcase, there is a need to readjust the bookcase.
Internal vs. External Fragmentation Internal fragmentation is similar to what would occur if an existing bookcase was split into two bookcases. Each bookcase would then be only half full. External fragmentation relates to where the new bookcase would be physically located. It would probably need to be placed at the end of the library, even though it would "logically" need to be in a different order. That means that to read the bookcases in order, you could no longer just walk directly from bookcase to bookcase but would need to follow pointers around the library to follow a chain from one bookcase to another.
Detecting Fragmentation SQL Server provides a measure of fragmentation in the sys.dm_db_index_physical_stats dynamic management view. The avg_fragmentation_in_percent column shows the percentage of fragmentation.
Planning for SQL Server 2008 R2 Indexing
SQL Server Management Studio also provides details of index fragmentation in the properties page for each index as shown in the following screenshot from the AdventureWorks2008R2 database:
Question: Why does fragmentation affect performance?
5-9
5-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Viewing Index Fragmentation
Key Points In this demonstration you will see how to identify fragmented indexes
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_05_PRJ\6232B_05_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Follow the instructions contained within the comments of the script file.
Question: How might solid state disk drives change concerns around fragmentation?
Planning for SQL Server 2008 R2 Indexing
5-11
Lesson 2
Data Types and Indexes
Not all data types work equally well as components of indexes. In this lesson, you will learn how effective a number of common data types are, when used within indexes. This will assist you in choosing data types when designing indexes.
Objectives After completing this lesson, you will be able to: •
Describe the effectiveness of numeric data when used in indexes
•
Describe the effectiveness of character data when used in indexes
•
Describe the effectiveness of date-related data when used in indexes
•
Describe the effectiveness of GUID data when used in indexes
•
Describe the effectiveness of BIT data when used in indexes
•
Explain how computed columns can be indexed.
5-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Numeric Index Data
Key Points Numeric data types tend to produce highly-efficient indexes. Exact numeric types are the most efficient.
Numeric Index Data When numeric values are used as components in indexes, large number of entries can fit in a small number of index pages. This makes reading indexes based on numeric values very fast. Sort operations are very common in index operations. Numeric values are fast to compare and sort. This improves both the general performance of index operations but also reduces the time taken to rebuild an index, should this be required. While numeric values are efficient in indexes, this typically applies only to the exact numeric data types. FLOAT and REAL data types are much less useful in indexes as they are larger and require more complex comparison techniques than exact numeric data types. FLOAT and REAL data types are also not precise which can led to unpredictable results when used with the equality (=) operator. The same situation occurs outside their use in indexes. Care needs to be taken with FLOAT and REAL predicates in certain operations. INT and BIGINT are the most efficient data types for indexing as they are relatively small and operations on them are very fast. Question: Would you imagine that processor bit size effects the speed when comparing INT or BIGINT values?
Planning for SQL Server 2008 R2 Indexing
5-13
Character Index Data
Key Points While it might seem natural to base indexes on character data, indexes constructed on character data tend to be less efficient than those constructed on numeric data. However, character based indexes are both common and useful.
Character Index Data Character data values tend to be larger than numeric values. For example, a character column might hold a customer's name or address details. This means that far less entries can exist in a given number of index pages. This makes character-based indexes slower to seek. Consider the operations required when comparing two string values. The complexity of these operations depends upon whether or not a strict binary comparison of the values can be undertaken. Most SQL Server systems use collations that are not based on binary comparisons. This means that every time a string value needs to be compared to another string value, a complex set of rules needs to be applied to determine the outcome of the comparison. This complexity during comparisons makes character based index operations slow by comparison with the same operations on numeric values. Character based indexes also tend to cause fragmentation problems as new values are almost never ascending or descending.
5-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Date-Related Index Data
Key Points Date related data types make good keys within indexes.
Date-Related Index Data Date related data types are only slightly less efficient than the integer data types. Date related data types are relatively small and can be compared and sorted quickly. Dates are very important (and very commonly used) in business applications as almost all business transactions involve a date (and possibly a time) when the transaction occurred. It is a very common requirement to need to locate all the transactions that occurred on a particular date or during a particular period (or range of dates). SQL Server 2008 introduced the date data type. It is very effective as a component of an index and more efficient than data types that also include time components.
Planning for SQL Server 2008 R2 Indexing
5-15
GUID Index Data
Key Points While GUID values can be relatively efficient in indexes, operations on those indexes can lead to fragmentation problems and inefficiency.
GUID Index Data GUID values are reasonably efficient within indexes. There is a common misconception that they are large. They are 16 bytes long and can be compared in a binary fashion. This means that they pack quite tightly into indexes and can be compared and sorted quite quickly. Because GUID values are random in nature, significant problems arise when they are used in indexes, if those indexes need to process a large number of insert operations. Fragmentation problems are commonplace with indexes created on GUID data types and these problems are a very common cause of performance problems in SQL Server databases. In the next module, you will see how the use of GUID data types within indexes affects the performance of operations on the indexes.
5-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
BIT Index Data
Key Points BIT columns are highly efficient in indexes. There is a common misconception that they are not useful but many valid scenarios exist for the use of BIT data type within indexes.
BIT Index Data There is a very common misconception that bit columns are not useful in indexes. This stems from the fact that there are only two values. However, the number of values is not the issue. It was discussed earlier in the module that the selectivity of queries is the most important issue issue. For example, consider a transaction table that contains 100 million rows and one of the columns (IsFinalized) indicates whether or not a transaction has been completed. There might only be 500 transactions that are not completed. An index that uses the IsFinalized column would be very useful for finding the unfinalized transactions. It would be highly selective. Note that the same index would be entirely useless for locating the finalized transactions. This difference is a good indication that it is an ideal candidate for the creation of a filtered index. (Filtered indexes are discussed later).
Planning for SQL Server 2008 R2 Indexing
5-17
Indexing Computed Columns
Key Points Indexing a computed column can be highly efficient. It can also assist with improving the performance of poorly designed databases.
Indexing Computed Columns You can only create indexes on computed columns when certain conditions are met: •
The expressions must be deterministic and precise
•
The ANSI_NULLS connection SET option must be ON
•
Expressions that return text, ntext or image aren't permitted in the definition of the computed column
•
The NUMERIC_ROUNDABORT connection SET option needs to be OFF
Note that SQL Server's query optimizer may ignore the index on a computed column, even if the requirements shown are met. The requirement for determinism and precision means that for a given set of input values, the same output values would always be returned. For example, the function SYSDATETIME() returns the current date and time whenever it is called. Its output would not be considered deterministic. You may want to create an index on a computed column when the results are queried or reported often. For example, a retail store may want to report on sales by day of the week (Sunday, Monday, Tuesday, etc.). You can create a computed column that determines the day of the week based on the date of the sale and then index that computed column. SQL Server 2005 introduced the ability to persist computed columns. Rather than calculating the value every time a SELECT operation is performed, the value can be calculated and stored whenever an INSERT
5-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
or UPDATE occurs. This is useful for data that is not updated frequently but is selected frequently. Indexes can be created on the persisted computed column.
Physical Analogy From our physical library analogy, a persisted computed column for a book could be imagined as a label that is placed on the book that records the number of pages in the book. Nothing about the book itself changes when the label is placed on it but you now don't have to pick the book up and count the number of pages in it, if you need to make a decision based on the number of pages in the book. An index could then be created based on the value on the label similarly to how an index could be created on the name of the author. Question: If a column in a database mostly held character values but occasionally (30 rows out of 50,000 rows in the table) holds a number, how could you quickly locate a row with a specific numeric value?
Planning for SQL Server 2008 R2 Indexing
5-19
Lesson 3
Single Column and Composite Indexes
The indexes discussed so far have been based on data from single columns. Indexes can also be based on the data from multiple columns. Indexes can also be constructed in ascending or descending order. This lesson investigates these concepts and the effects they have on index design along with details of how SQL Server maintains statistics on the data contained within indexes.
Objectives After completing this lesson, you will be able to: •
Describe the differences between single column versus composite indexes
•
Describe the differences between ascending versus descending indexes
•
Explain how SQL Server keeps statistics on indexes
5-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Single Column vs. Composite Indexes
Key Points Indexes can be constructed on multiple columns rather than on single columns. Multi-column indexes are known as composite indexes.
Single Columns vs. Composite Indexes Composite indexes are often more useful than single column indexes in business applications. The advantages of composite indexes are: •
Higher selectivity
•
The possibility of avoiding the need to sort the output rows
In our physical library analogy, consider a query that required the location of books by a publisher within a specific release year. While a publisher index would be useful for finding all the books released by the publisher, it would not help to narrow down the search to those books within the release year. Separate indexes on publisher and release year would not be useful but an index that contained both publisher and release year could be very selective. Similarly, an index by topic would be of limited value also. Once the correct topic was located, all the books on that topic would have to be searched to determine if they were by the specified author. The best option would be an author index that also included details of each book's topic. In that case, a scan of the index pages for the author would be all that is required to work out which books need to be accessed. In the absence of any other design criteria, you should typically index the most selective column first, when constructing composite indexes. Question: Why might an index on customer then order date be more or less effective than an index on order date then customer?
Planning for SQL Server 2008 R2 Indexing
5-21
Ascending vs. Descending Indexes
Key Points Each component of an index can be created in an ascending or descending order. For single column indexes, ascending and descending indexes are equally useful. For composite indexes, specifying the order of individual columns within the index might be useful.
Ascending vs. Descending Indexes In general it makes no difference whether a single column index is ascending or descending. From our physical library analogy, you could scan either the bookshelves or the indexes from either end. The same amount of effort would be required no matter which end you started from. Composite indexes can benefit from different order in each component. Often this is used to avoid sorts. For example, you might need to output orders by date descending within customer ascending. From our physical library analogy, imagine that an author index a list of books by release date within the author index. This would be easier if the index was already structured this way.
5-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Index Statistics
Key Points SQL Server keeps statistics on indexes to assist when making decisions about how to access the data in a table.
Index Statistics Earlier in the module, you saw that SQL Server needs to make decisions about how to access the data in a table. For each table that is referenced in a query, SQL Server might decide to read the data pages or it might decide to use an index. It is important to realize though, that SQL Server must make this decision before it begins to execute a query. This means that it needs to have information that will assist it in making this determination. For each index, SQL Server keeps statistics that tell it how the data is distributed.
Physical Analogy When discussing the physical library analogy earlier, it was mentioned that if you were looking up the books for an author, using an index that is ordered by author could be useful. However, if you were locating books for a range of authors, that there would be a point at which scanning the entire library would be quicker than running backwards and forwards from the index to the shelves of books. The key issue here is that you need to know, before executing the query, how selective (and therefore useful) the indexes would be. The statistics held on indexes provide this knowledge. Question: Before starting to perform your lookup in a physical library, how would you know which way was quicker?
Planning for SQL Server 2008 R2 Indexing
5-23
Demonstration 3A: Viewing Index Statistics
Key Points In this demonstration you will see how to work with index statistics.
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_05_PRJ\6232B_05_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: Why would you not always choose to use FULLSCAN for statistics?
5-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 5: Planning for SQL Server Indexing
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
Planning for SQL Server 2008 R2 Indexing
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i.
User name: AdventureWorks\Administrator
ii.
Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
5-25
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_05_PRJ\6232B_05_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario You have been asked to explain the concept of index statistics and selectivity to a new developer. You will explore the statistics available on an existing index and determine how selective some sample queries would be. One of the company developers has provided you with a list of the most important queries that will be executed by the new marketing management system. Depending upon how much time you have available, you need to determine the best column orders for indexes to support each query. Complete as many as possible within the allocated time. In later modules, you will consider how these indexes would be implemented. Each query is to be considered in isolation in this exercise.
Supporting Documentation Query 1: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE ProspectID = 12553;
Query 2: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Arif%';
Query 3: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Alejandro%'
5-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
ORDER BY LastName, FirstName;
Query 4: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName >= 'S' ORDER BY LastName, FirstName;
Query 5: SELECT LanguageID, COUNT(1) FROM Marketing.ProductDescription GROUP BY LanguageID;
Planning for SQL Server 2008 R2 Indexing
5-27
Exercise 1: Explore existing index statistics Scenario You have been asked to explain the concept of index statistics and selectivity to a new developer. You will explore the statistics available on an existing index and determine how selective some sample queries would be. The main tasks for this exercise are as follows: 1.
Execute the following command in the MarketDev database:
EXEC sp_helpstats ‘Marketing.Product’
2.
Review the results. Have any autostats been generated?
3.
Create manual statistics on the Color column. Call the statistics Product_Color_Stats. Use a full scan of the data when creating the statistics.
4.
Re-execute the command from task 1 to see the change.
5.
Using the DBCC SHOW_STATISTICS command, review the created Product_Color_Stats statistics.
6.
Answer the following questions related to the Product_Color_Stats statistics:
7.
a.
How many rows were sampled?
b.
How many steps were created?
c.
What was the average key length?
d.
How many Black products are there?
Execute the following command to check how accurate the statistics that have been generated are:
SELECT COUNT(1) FROM Marketing.Product WHERE Color = 'Black';
8.
Calculate the selectivity of each of the three queries shown: a)
SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'A%';
b)
SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Alejandro%';
c)
SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Arif%';
Task 1: Execute SQL Command 1.
Execute the following command in the MarketDev database:
EXEC sp_helpstats ‘Marketing.Product’
Task 2: Review the results •
Review the results.
•
Check to see if any autostats has been generated?
Task 3: Create statistics •
Create manual statistics on the Color column. Call the statistics Product_Color_Stats. Use a full scan of the data when creating the statistics.
5-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Task 4: Re-execute the SQL command from task 1 •
Re-execute the following command in the MarketDev database:
EXEC sp_helpstats ‘Marketing.Product’
Task 5: Use DBCC SHOW_STATISTICS •
Using the DBCC SHOW_STATISTICS command, review the created Product_Color_Stats statistics
Task 6: Answer questions •
Answer the following questions related to the Product_Color_Stats statistics: a.
How many rows were sampled?
b.
How many steps were created?
c.
What was the average key length?
d.
How many Black products are there?
Task 7: Execute SQL Command and check accuracy of statistics •
Execute the following command to check how accurate the statistics that have been generated are:
SELECT COUNT(1) FROM Marketing.Product WHERE Color = 'Black';
Task 8: Calculate Selectivity of each query •
Calculate the selectivity of each of the three queries shown:
Query 1: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'A%';
Query 2: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Alejandro%';
Query 3: SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE 'Arif%';
Results: After this exercise, you have assessed Selectivity on each various queries.
Planning for SQL Server 2008 R2 Indexing
5-29
Challenge Exercise 2: Design column orders for indexes (Only if time permits) Scenario One of the company developers has provided you with a list of the most important queries that will be executed by the new marketing management system. You need to determine the best column orders for indexes to support each query. In later modules, you will consider how these indexes would be implemented. Each query is to be considered in isolation in this exercise. The main tasks for this exercise are as follows: 1. 2. 3. 4. 5.
Determine which columns should be part of an index for Query1 and the best order for the columns to support the query. Determine which columns should be part of an index for Query2 and the best order for the columns to support the query. Determine which columns should be part of an index for Query3 and the best order for the columns to support the query. Determine which columns should be part of an index for Query4 and the best order for the columns to support the query. Determine which columns should be part of an index for Query5 and the best order for the columns to support the query.
Task 1: Design an index •
Review the supporting documentation, determine which columns should be part of an index for Query 1 and the best order for the columns to support the query.
Task 2: Design an index •
Review the supporting documentation, determine which columns should be part of an index for Query 2 and the best order for the columns to support the query.
Task 3: Design an index •
Review the supporting documentation, determine which columns should be part of an index for Query 3 and the best order for the columns to support the query.
Task 4: Design an index •
Review the supporting documentation, determine which columns should be part of an index for Query 4 and the best order for the columns to support the query.
Task 5: Design an index •
Review the supporting documentation, determine which columns should be part of an index for Query 5 and the best order for the columns to support the query. Results: After this exercise, you should designed new indexes taking into consideration of Selectivity
5-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Review and Takeaways
Review Questions 1. 2.
Do tables need indexes? Why do constraints use indexes?
Best Practices 1. 2.
Design indexes to maximize sensitivity which leads to lower I/O. In absence of other requirements, aim to have the most selective columns first in composite indexes.
Implementing Table Structures in SQL Server 2008 R2
Module 6 Implementing Table Structures in SQL Server 2008 R2 Contents: Lesson 1: SQL Server Table Structures
6-3
Lesson 2: Working with Clustered Indexes
6-13
Lesson 3: Designing Effective Clustered Indexes
6-20
Lab 6: Implementing Table Structures in SQL Server
6-26
6-1
6-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
One of the most important decisions that needs to be taken when designing tables in SQL Server databases relates to the structure of the table. Regardless of whether or not other indexes are used to locate rows, the table itself can be structured like an index or left without such a structure. In this module, you will learn how to choose an appropriate table structure. For situations where you decide to have a specific structure in place, you will learn how to create an effective structure.
Objectives After completing this lesson, you will be able to: •
Explain how tables can be structured in SQL Server databases
•
Work with clustered indexes
•
Design effective clustered indexes
Implementing Table Structures in SQL Server 2008 R2
6-3
Lesson 1
SQL Server Table Structures
There are two ways that SQL Server tables can be structured. Rows can be added in any order or rows can be ordered. In this lesson, you will investigate both options; gain an understanding of how common data modification operations are impacted by each option. Finally, you will see how unique clustered indexes are structured differently to non-unique clustered indexes.
Objectives After completing this lesson, you will be able to: •
Describe how tables can be organized as heaps
•
Explain how common operations are performed on heaps
•
Detail the issues that can arise with forwarding pointers
•
Describe how tables can be organized with clustered indexes
•
Explain how common operations are performed on tables with clustered indexes
•
Describe how unique clustered indexes are structured differently to non-unique clustered indexes
6-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Heap?
Key Points A heap is a table that has no enforced order for either the pages within the table or for the data rows within each page.
Heaps The simplest table structure available in SQL Server is a heap. Data rows are added to the first available location within the table's pages that have sufficient space. If no space is available, additional pages are added to the table and the rows placed in those pages. Even though no index structure exists for a heap, SQL Server tracks the available pages using an entry in an internal structure called an Index Allocation Map (IAM). Heaps are allocated index id zero in this map.
Physical Analogy In the physical library analogy, a heap would be represented by structuring your library so that every book is just placed in any available space found that is large enough. Without any other assistance, finding a book would involve scanning one bookcase after another. Question: Why might modifying a row cause it to need to move between pages?
Implementing Table Structures in SQL Server 2008 R2
6-5
Operations on Heaps
Key Points The most common operations performed on tables are INSERT, UPDATE, DELETE and SELECT operations. It is important to understand how each of these operations is affected by structuring a table as a heap.
Physical Analogy In the library analogy, an INSERT would be executed by locating any gap large enough to hold the book and placing it there. If no space that is large enough is available, a new bookcase would be allocated and the book placed into it. This would continue unless a limit existed on the number of bookcases that the library could contain. A DELETE operation could be imagined as scanning the bookcases until the book is found, removing the book and throwing it away. More precisely, it would be like placing a tag on the book to say that it is to be thrown out the next time the library is cleaned up or space on the bookcase is needed. An UPDATE operation would be represented by replacing a book with a (potentially) different copy of the same book. If the replacement book was the same (or smaller) size as the original book, it could be placed directly back in the same location as the original book. However, if the replacement book was larger, the original book would be removed and placed into another location. The new location for the book could be in the same bookcase or in another bookcase. Question: What would be involved in finding a book in a library structured as a heap? (This would simulate a SELECT operation).
6-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Forwarding Pointers
Key Points When other indexes point to rows in a heap, data modification operations cause forwarding pointers to be inserted into the heap. This can cause performance issues over time.
Physical Analogy Now imagine that the physical library was organized as a heap where books were stored in no particular order. Further imagine that three additional indexes were created in the library, to make it easier to find books by author, ISBN, and release date. As there was no order to the books on the bookcases, when an entry was found in the ISBN index, the entry would refer to the physical location of the book. The entry would include an address like "Bookcase 12 - Shelf 5 - Book 3". That is, there would need to be a specific address for a book. An update to the book that caused it to need to be moved to a different location would be problematic. One option for resolving this would be to locate all index entries for the book and update the new physical location. An alternate option would be to leave a note in the location where the book used to be that points to where the book has been moved to. This is what a forwarding pointer is in SQL Server. This allows rows to be updated and moved without the need to update other indexes that point to them. A further challenge arises if the book needed to be moved again. There are two options ways that this could be handled. Either yet another note could be left pointing to the new location or the original note could be modified to point to the new location. Either way, the original indexes would not need to be updated. SQL Server deals with this by updating the original forwarding pointer. This way, performance does not continue to degrade by having to follow a chain of forwarding pointers.
Implementing Table Structures in SQL Server 2008 R2
6-7
ALTER TABLE WITH REBUILD Forwarding pointers were a common performance problem with SQL Server tables that were structured as heaps. There were no straightforward options for "cleaning up" a heap to remove the forwarding pointers. While options existed for removing forwarding pointers, each had significant downsides. SQL Server 2008 introduced a method for dealing with this problem via the command: ALTER TABLE SomeTable WITH REBUILD;
Note that while options to rebuild indexes have been available in prior versions, the option to rebuild a table was not available. This command can also be used to change the compression settings for a table. (Page and Row Compression are an advanced topic beyond the scope of this course).
6-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Clustered Index?
Key Points Rather than storing data rows of a data as a heap, tables can be designed with an internal logical ordering. This is known as a clustered index.
Clustered Index A table with a clustered index has a predefined order for rows within a page and for pages within the table. The order is based on a key made up of one or more columns. The key is commonly called a clustering key. Because the rows of a table can only be in a single order, there can be only a single clustered index on a table. An Index Allocation Map entry is used to point to a clustered index. Clustered indexes are always index id = 1. There is a common misconception that pages in a clustered index are "physically stored in order". While this is possible in rare situations, it is not commonly the case. If it was true, fragmentation of clustered indexes would not exist. SQL Server tries to align physical and logical order while creating an index but disorder can arise as data is modified. Index and data pages are linked within a logical hierarchy and also double-linked across all pages at the same level of the hierarchy to assist when scanning across an index.
Physical Analogy In the library analogy, a clustered index is similar to storing all books in a specific order. An example of this would be to store books in ISBN (International Standard Book Number) order. Clearly, the library can only be in a single order.
Implementing Table Structures in SQL Server 2008 R2
6-9
Operations on Clustered Indexes
Key Points Earlier you saw how common operations were carried out on tables structured as heaps. It is important to understand how each of those operations is affected by structuring a table with a clustered index.
Physical Analogy In a library that is ordered in ISBN order, an INSERT operation requires a new book to be placed in exactly the correct logical ISBN order. If there is space somewhere on the bookcase that is in the required position, the book can be placed into the correct location and all other books in the bookcase moved to accommodate the new book. If there is not sufficient space, the bookcase needs to be split. Note that a new bookcase would be physically placed at the end of the library but would be logically inserted into the list of bookcases. INSERT operations would be straightforward if the books were being added in ISBN order. New books could always be added to the end of the library and new bookcases added as required. In this case, no splitting is required. When an UPDATE operation is performed, if the replacement book is the same size or smaller and the ISBN has not changed, the book can just be replaced in the same place. If the replacement book is larger and the ISBN has not changed, and there is spare space within the bookcase, all other books in the bookcase can be slid along to allow the larger book to be replaced in the same spot. If there was insufficient space in the bookcase to accommodate the larger book, the bookcase would need to be split. If the ISBN of the replacement book wass different to the original book, the original book would need to be removed and the replacement book treated like the insertion of a new book. A DELETE operation would involve the book being removed from the bookcase. (Again, more formerly, it would be flagged as free space but simply left in place for later removal). When a SELECT is performed, if the ISBN is known, the required book can be quickly located by efficiently searching the library. If a range of ISBN's was requested, the books would be located by finding the first
6-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
book and continuing to collect books in order until a book is encountered that is out of range or until the end of the library is reached. Question: What sort of queries would now perform better in this library?
Implementing Table Structures in SQL Server 2008 R2
6-11
Unique vs. Non-Unique Clustered Indexes
Key Points SQL Server must be able to uniquely identify any row in a table. Clustered indexes can be created as unique or non-unique.
Unique vs. Non-Unique Clustered Indexes If you do not specify indexes as being unique, SQL Server will add another value to the clustering key to ensure that the values are unique for each row. This value is commonly called a "uniqueifier".
Physical Analogy In the library analogy, a unique index is like a rule that says that no more than a single copy of any book can ever be stored. If an insert of a new book is attempted and another book is found to have the same ISBN (assuming that the ISBN was the clustering key), the insertion of the new book would be refused. It is important to understand that the comparison is made only on the clustering key. The book would be rejected for having the same ISBN, even if other properties of the book are different. A non-unique clustered index is similar to having a rule that allows more than a single book with the same ISBN. The issue is that it is likely to be desirable to track each copy of the book separately. The uniqueifier that is added by SQL Server would be like a "Copy Number" being added to books that can be duplicated. The uniqueifier is not visible to users.
6-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Rebuilding Heaps
Key Points In this demonstration you will see how to: •
Create a table as a heap
•
Check the fragmentation and forwarding pointers for a heap
•
Rebuild a heap
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_06_PRJ\6232B_06_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Follow the instructions contained within the comments of the script file.
Implementing Table Structures in SQL Server 2008 R2
6-13
Lesson 2
Working with Clustered Indexes
If a decision has been made to structure a table with a clustered index, it is important to be familiar with how the indexes are created, altered or dropped. In this lesson, you will see how to perform these actions, understand how SQL Server performs them automatically in some situations and see how to incorporate free space within indexes to improve insert performance.
Objectives After completing this lesson, you will be able to: •
Create clustered indexes
•
Drop a clustered index
•
Alter a clustered index
•
Incorporate Free Space in indexes
6-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating Clustered Indexes
Key Points Clustered indexes can be created either directly using the CREATE INDEX command or automatically in some situations where a PRIMARY KEY constraint is specified on the table.
Creating Clustered Indexes It is very important to understand the distinction between a PRIMARY KEY and a clustering key. Many users confuse the two terms or attempt to use them interchangeably. A PRIMARY KEY is a constraint. It is a logical concept that is supported by an index but the index may or may not be a clustered index. The default action in SQL Server when a PRIMARY KEY constraint is added to a table is to make it a clustered PRIMARY KEY if no other clustered index already exists on the table. This action can be overridden by specifying the word NONCLUSTERED when declaring the PRIMARY KEY constraint. In the first example on the slide, the dbo.Article table is being declared. The ArticleID column has a PRIMARY KEY constraint associated with it. As there is no other clustered index on the table, the index that is created to support the PRIMARY KEY constraint will be created as a clustered PRIMARY KEY. ArticleID will be the clustering key as well as the PRIMARY KEY for the table. In the second example on the slide, the table dbo.LogData is initially created as a heap. When the PRIMARY KEY constraint is added to the table, no other clustered index is present on the table, so SQL Server will create the index to support the PRIMARY KEY constraint as a clustered index. If a table has been created as a heap, it can be converted to a clustered index structure by adding a clustered index to the table. In the fourth command shown in the examples on the slide, a clustered index named CL_LogTime is added to the dbo.LogTime table with the LogTimeID column as the clustering key. This command will not only create an index over the data; it causes the entire structure of the table to be reorganized.
Implementing Table Structures in SQL Server 2008 R2
Question: What else would be added to your table if you added a non-unique clustered index to it?
6-15
6-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Dropping a Clustered Index
Key Points The method used to drop a clustered index depends upon the way the clustered index was created.
Dropping a Clustered Index The DROP INDEX command can be used to drop clustered indexes that were created with the CREATE INDEX command. Indexes that are created internally to support constraints need to be removed by removing the constraint. Note in the second example on the slide that the PRIMARY KEY constraint is being dropped. This would cause a clustered index that had been created to support that key to also be dropped. When the clustered index is dropped, the data in the table is not lost. The table is reorganized as a heap. Question: How could you remove a primary key constraint that was being referenced by a foreign key constraint?
Implementing Table Structures in SQL Server 2008 R2
6-17
Altering a Clustered Index
Key Points Minor modifications to indexes are permitted through the ALTER INDEX statement but it cannot be used to modify the structure of the index, including the columns that make up the key.
Altering a Clustered Index A few maintenance operations are possible with the ALTER INDEX statement. For example, an index can be rebuilt or reorganized. (Reorganizing an index only affects the leaf level of the index). Restructuring an index is not permitted within an ALTER INDEX statement. Columns that make up the clustering key cannot be added or removed using this command and the index cannot be moved to a different filegroup. (Filegroups are a concept that is covered in the course 6231B Maintaining a SQL Server 2008 R2 Database).
WITH DROP EXISTING An option to change the structure of an index is provided while creating a replacement index. The CREATE INDEX command includes a WITH DROP EXISTING clause that can allow the statement to replace an existing index. Note that an index cannot be changed from a clustered to a non-clustered index or back using this command. (Non-clustered indexes are covered in module 08).
Disabling Indexes While the ALTER INDEX statement includes a DISABLE option that can be applied to any index, this option is of limited use with clustered indexes. Once a clustered index is disabled, no access to the data in the table is then permitted until it is rebuilt.
6-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Incorporating Free Space in Indexes
Key Points The FILLFACTOR and PADINDEX options are used to provide free space within index pages. This can improve INSERT and UPDATE performance in some situations but often to the detriment of SELECT operations.
FILLFACTOR and PADINDEX The availability of free space in an index page can have a significant effect on the performance of index update operations. If an index record must be inserted and there is no free space, a new index page must be created and the contents of the old page split across the two pages. This can affect performance if it happens too frequently. The performance impacts of page splits can be alleviated by leaving empty space on each page when creating an index, including a clustered index. This is achieved by specifying a FILLFACTOR value. FILLFACTOR defaults to 0, which means "fill 100%". Another other value (including 100) is taken as the percentage of how full each page should be. For the example in the slide, this means 70% full and 30% free space on each page. FILLFACTOR only applies to leaf level pages in an index. PAD_INDEX is an option that, when enabled, causes the same free space to be allocated in the non-leaf levels of the index. Question: While you could avoid many page splits by setting a FILLFACTOR of 50, what would be the downside of doing this? Question: When would a FILLFACTOR of 100 be useful?
Implementing Table Structures in SQL Server 2008 R2
6-19
Demonstration 2A: Clustered Indexes
Key Points In this demonstration you will see how to: •
Create a table with a clustered index
•
Detect fragmentation in a clustered index
•
Correct fragmentation in a clustered index
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_06_PRJ\6232B_06_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Follow the instructions contained within the comments of the script file.
Question: Where was the performance of the UPDATE statement against this table much faster than the one against the heap?
6-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Designing Effective Clustered Indexes
When creating clustered indexes on tables, it is important to understand the characteristics of good clustering keys. Some data types work better for clustering keys than others. In this lesson, you will see how to design good clustering keys and also see how clustered indexes can be created on views.
Objectives After completing this lesson, you will be able to: •
Describe characteristics of good clustering keys
•
Explain which data types are most appropriate for use in clustering keys
•
Create indexed views
•
Explain considerations that must be made when working with indexed views
Implementing Table Structures in SQL Server 2008 R2
6-21
Characteristics of Good Clustering Keys
Key Points Many different types of data can be used for clustering a table. While not every situation is identical, there is a set of characteristics that generally create the best clustering keys. Keys should be short, static, increasing and unique.
Characteristics of Good Clustering Keys Although some designs might call for different styles of clustering key, most designs call for clustering keys with the following characteristics: •
Short – clustering keys should be short. They need to be sorted and they are stored at the leaf level of every other index. While there is a limit of 16 columns and 900 bytes, good clustering keys are typically much, much smaller than this.
•
Static – clustering keys should be based on data values that do not change. This is one reason why primary keys are often used for this purpose. A change to the clustering key will mean the need to move the row. You have seen already that moving rows is generally not desirable.
•
Increasing – this assists with INSERT behavior. If the keys within the data are increasing as they are inserted, then the inserts happen directly at the logical end of the table. This minimizes fragmentation, the need to split pages, and reduces the amount of memory needed for page buffers.
•
Unique – unique clustering keys do not need to have a uniqueifier column added by SQL Server. It is important to declare unique values as being unique. Otherwise, SQL Server will still add a uniqueifier column to the key.
6-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Appropriate Data Types for Clustering Keys
Key Points Similar to the way that some data types are generally better as components of indexes than other data types, some data types are more appropriate for use as clustering keys than others.
Appropriate Data Types for Clustering Keys int and bigint typically make the best clustering keys in general use, particularly if they are used in conjunction with an IDENTITY constraint that causes their values to continue to increase. (Constraints are discussed in a later module). The biggest challenge in current designs is the use (and overuse) of GUIDs that are stored in uniqueidentifier columns. While they are larger than the integer types, GUIDs are random in nature and routinely cause index fragmentation through page splits when used as clustering keys. Character data types can be used for clustering keys but the sorting performance of character data types is limited. Character values often tend to change in typical business applications. Date data is typically not unique but provides excellent advantages in size, sorting performance. It works well for date range queries that are common in typical business applications.
Logical vs. Physical Schema Users typically struggle with the concept that their physical data schema does not have to match their logical data schema. For example, while GUIDs might be used throughout an application layer, they do not have to be used throughout the physical implementation of the schema. One option would be to use one table to look up an int based on a GUID and have that int used everywhere else in the design. Question: New uniqueidentifier values in SQL Server can be generated with the NEWID() function. SQL Server 2005 introduced the NEWSEQUENTIALID() function to try to address the issue of increasing values. Why doesn't this typically solve the problem of random values?
Implementing Table Structures in SQL Server 2008 R2
6-23
Creating Indexed Views
Key Points Clustered indexes can be created over views. A view with a clustered index is called an "indexed view". Indexed views are the closest SQL Server equivalent to "materialized views" in other databases. Indexed views can have a profound (positive) impact on the performance of queries in particular circumstances.
Creating Indexed Views The concept of an indexed view might at first seem odd as an index is being created over an object that is not persisted. Indexes views are very useful for maintaining precalculated aggregates or joins. When updates to the underlying data are made, SQL Server makes updates to the data stored in the indexed view automatically. You can imagine an indexed view as a special type of table with a clustered index. The differences are that the schema of the table isn't defined directly; it is defined by the SELECT statement in the view. Also, you don't modify the table directly; you modify the data in the "real" tables that underpin the view. When the data in the underlying tables is modified, SQL Server realizes that it needs to update the data in the indexed view. Indexed views have a negative impact on the performance of INSERT, DELETE, and UPDATE operations on the underlying tables but they can also have a very positive impact on the performance of SELECT queries on the view. They are most useful for data that is regularly selected but much less frequently updated.
6-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Indexed View Considerations
Key Points The use of indexed views is governed by a set of considerations that must be met for the views to be utilized. Premium editions of SQL Server take more complete advantage of indexed views.
Indexed View Considerations Indexed views can be a challenge to set up and use. Books Online details a list of SET options that need to be in place both at creation time for the indexed view and in sessions that take advantage of the indexed views. Particular attention should be given to the CONCAT_NULL_YIELDS_NULL and QUOTED_IDENTIFIER settings. Indexes can only be built on views that are deterministic. That is, the views must always return the same data unless the underlying table data is altered. For example, an indexed view could not contain a column that returned the outcome of the SYSDATETIME() function. SCHEMABINDING is an option that the view must have been created with before an index can be created on the view. The SCHEMABINDING option prevents changes to the schema of the underlying tables while the view exists.
Implementing Table Structures in SQL Server 2008 R2
6-25
Demonstration 3A: Indexed Views
Key Points In this demonstration you will see how to: •
Obtain details of indexes created on views
•
See if an indexed view has been used in an estimated execution plan
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_06_PRJ\6232B_06_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: How could you ensure that an indexed view is selected when working with Standard Edition of SQL Server?
6-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 6: Implementing Table Structures in SQL Server
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
Implementing Table Structures in SQL Server 2008 R2
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i. ii.
6-27
User name: AdventureWorks\Administrator Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_06_PRJ\6232B_06_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario One of the most important decisions when designing a table is to choose an appropriate table structure. In this lab, you will choose an appropriate structure for some new tables required for the relationship management system.
Supporting Documentation Table 1: Relationship.ActivityLog Name
Data Type
ActivityTime
datetimeoffset
SessionID
int
Duration
int
ActivityType
int
Constraint
Table 2: Relationship.PhoneLog Name
Data Type
Constraint
PhoneLogID
int
Primary Key
SalespersonID
int
CalledPhoneNumber
nvarchar(16)
CallDurationSeconds
int
6-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Table 3: Relationship.MediaOutlet Name
Data Type
MediaOutletID
int
MediaOutletName
nvarchar(40)
PrimaryContact
nvarchar(50)
City
nvarchar(50)
Constraint
Table 4: Relationship.PrintMediaPlacement Name
Data Type
Constraint
PrintMediaPlacementID
int
Primary Key
MediaOutletID
int
PlacementDate
datetime
PublicationDate
datetime
RelatedProductID
int
PlacementCost
decimal(18,2)
Table 5: Name
Data Type
Constraint
ApplicationID
int
IDENTITY(1,1)
ApplicantName
nvarchar(150)
EmailAddress
nvarchar(100)
ReferenceID
uniqueidentifier
Comments
nvarchar(500)
Implementing Table Structures in SQL Server 2008 R2
6-29
Exercise 1: Creating Tables as Heaps Scenario You need to create some new tables to support the relationship management system. You will create two tables that are structured as heaps. The main tasks for this exercise are as follows: 1.
Review the Requirements.
2.
Create the Tables in the MarketDev database.
Task 1: Review the Requirements •
Review the requirements in the supporting documentation for Table 1 and 2.
Task 2: Create the Tables in the MarketDev database •
Create a table based on the supporting documentation for Table 1.
•
Create a table based on the supporting documentation for Table 2. Results: After this exercise, you have created two tables that are structured as Heaps.
6-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 2: Creating Tables with Clustered Indexes Scenario The design documentation also calls for some tables with clustered indexes. You will then create two tables that have clustered indexes. The main tasks for this exercise are as follows: 1.
Review the Requirements.
2.
Create the Tables in the MarketDev database.
Task 1: Review the Requirements •
Review the requirements in the supporting documentation for Table 3 and 4.
Task 2: Create the Tables in the MarketDev database •
Create a table based on the supporting documentation for Table 3.
•
Create a table based on the supporting documentation for Table 4. Results: After this exercise, you have created two tables that have clustered indexes.
Implementing Table Structures in SQL Server 2008 R2
6-31
Challenge Exercise 3: Comparing the Performance of Clustered Indexes vs. Heaps (Only if time permits) Scenario A company developer has approached you to decide whether a new table should have a clustered index or not. Insert performance of the table is critical. You will consider the design, create a number of alternatives and compare the performance of each against a set of test workloads. The main tasks for this exercise are as follows: 1.
Review the Design for Table 5.
2.
Create a table based on the design with no clustered index. Call the table Relationship.Table_Heap.
3.
Create a table based on the design with a clustered index on the ApplicantID column. Call the table Relationship.Table_ApplicationID.
4.
Create a table based on the design with a clustered index on the EmailAddress column. Call the table Relationship.Table_EmailAddress.
5.
Create a table based on the design with a clustered index on the ReferenceID column. Call the table Relationship.Table_ReferenceID.
6.
Load and execute the workload script. (Note: this may take some minutes to complete. You can check where it is up to by viewing the Messages tab. A message is printed as each of the four sections is completed. While the script is running, review the contents of the script and estimate the proportion of time difference you expect to see in the results).
7.
Compare the performance of each table structure.
Task 1: Review the Table Design •
Review the table design in the supporting documentation for Table 5.
Task 2: Create the Relationship.Table_Heap Table •
In the supporting documentation for Table5, create a table based on the design with no clustered index.
•
Call the table Relationship.Table_Heap
Task 3: Create the Relationship.Table_ApplicationID Table •
In the supporting documentation for Table5, create a table based on the design with a clustered index on the ApplicantID column.
•
Call the table Relationship.Table_ApplicationID
Task 4: Create the Relationship.Table_EmailAddress Table •
In the supporting documentation for Table5, create a table based on the design with a clustered index on the EmailAddress column.
•
Call the table Relationship.Table_EmailAddress.
6-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Task 5: Create the Relationship.Table_ReferenceID Table •
In the supporting documentation for Table5, create a table based on the design with a clustered index on the ReferenceID column.
•
Call the table Relationship.Table_ReferenceID.
Task 6: Load and Execute the Workload Script •
Load and execute the workload script. (Note: this may take some minutes to complete. You can check where it is up to by viewing the Messages tab. A message is printed as each of the four sections is completed. While the script is running, review the contents of the script and estimate the proportion of time difference you expect to see in the results).
Task 7: Compare Table Performance •
Compare the performance of each table structure Results: After this exercise, you have created four tables compare performance between clustered and non-clustered indexes.
Implementing Table Structures in SQL Server 2008 R2
6-33
Module Review and Takeaways
Review Questions 1.
What is the main problem with uniqueidentifiers used as primary keys?
2.
Where are newly inserted rows placed when a table is structured as a heap?
Best Practices 1.
Unless specific circumstances arise, most tables should have a clustered index.
2.
The clustered index may or may not be placed on the table's primary key.
3.
When using GUID primary keys in the logical data model, consider avoiding their use throughout the physical implementation of the data model.
6-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Reading SQL Server 2008 R2 Execution Plans
Module 7 Reading SQL Server 2008 R2 Execution Plans Contents: Lesson 1: Execution Plan Core Concepts
7-3
Lesson 2: Common Execution Plan Elements
7-14
Lesson 3: Working with Execution Plans
7-24
Lab 7: Reading SQL Server Execution Plans
7-31
7-1
7-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
In earlier modules, you have seen that one of the most important decisions that SQL Server takes when executing a query, is how to access the data in any of the tables involved in the query. SQL Server can read the underlying table (which might be structured as a heap or with a clustered index) but it might also choose to use another index. In the next module, you will see how to design additional indexes but before learning this, it is important to know how to determine the outcomes of the decisions that SQL Server makes. Execution plans show how each step of a query is to be executed. In this module, you will learn how to read and interpret execution plans.
Objectives After completing this lesson, you will be able to: 1.
Explain the core concepts related to the use of execution plans
2.
Describe the role of the most common execution plan elements
3.
Work with execution plans
Reading SQL Server 2008 R2 Execution Plans
7-3
Lesson 1
Execution Plan Core Concepts
The first steps in working with SQL Server execution plans are to understand why they are so important and to understand the phases that SQL Server passes through when executing a query. Armed with that information, you can learn what an execution plan is, what the different types of execution plans are, and how execution plans relate to execution contexts. Execution plans can be retrieved in a variety of formats. It is also important to understand the differences between each of these formats and to know when to use each format.
Objectives After completing this lesson, you will be able to: 1. 2. 3. 4. 5. 6.
Explain why execution plans matter Describe the phases that SQL Server passes through while executing a query Explain what execution plans are Describe the difference between actual and estimated execution plans Describe execution contexts Make effective use of the different execution plan formats
7-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
Why Execution Plans Matter
Key Points Rather than trying to guess how a query is to be performed or how it was performed, execution plans allow precise answers to be obtained. Execution plans are also commonly referred to as query plans.
Why Execution Plans Matter If you spend any time reading posts in the SQL Server forums or newsgroups, or participating in any of the SQL Server related email distribution lists, you will notice questions that occur very regularly: •
Why is it that my query takes such a long time to complete?
•
This query is so similar to another query that executes quickly, yet this query takes much longer to complete. Why is this happening?
•
I created an index to make access to the table fast but SQL Server is ignoring the index. Why won't it use my index?
•
I've created an index on every column in the table yet SQL Server still takes the same time to execute my query. Why is it ignoring the indexes?
These are such common questions yet SQL Server provides tools to help answer the questions. Execution plans show how SQL Server intends to execute a query or how it did execute a query. The ability to interpret these execution plans provides you with the ability to answer the questions above. Many users capture execution plans and then try to resolve the worst performing aspects of a query. The best use of execution plans however, is in verifying that the plan you expected to be used, was the plan that was used. This means that you need to already have an idea of how you expect SQL Server to execute your queries. You will see more information on doing this in the next module.
Reading SQL Server 2008 R2 Execution Plans
Query Execution Phases
Key Points SQL Server executes queries in a series of phases. A key outcome of one of the phases is an execution plan. Once compiled, a plan may be cached for later use.
T-SQL Parsing The first phase when executing queries is to check that the statements supplied in the batch follow the rules of the language. Each statement is checked to find any syntax errors. Object names within the statements are located. Question: What is the difference between a statement and a batch?
Object Name Resolution In the second phase, SQL Server resolves the names of objects to their underlying object IDs. SQL Server needs to know exactly which object is being referred to. For example, consider the following statement: SELECT * FROM Product;
While at first glance, it might seem that mapping the Product table to its underlying object ID would be easy, consider that SQL Server supports more than a single object with the same name in a database, through the use of schemas. For example, note that each of the objects in the following code could be completely different in structure and that the names relate to entirely different objects: SELECT * FROM Production.Product; SELECT * FROM Sales.Product; SELECT * FROM Marketing.Product;
SQL Server needs to apply a set of rules to relate the table name "Product" to the intended object.
7-5
7-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Query Optimization Once the object IDs have been resolved, SQL Server needs to decide how to execute the overall batch. Based on the available statistics, SQL Server will make decisions about how to access the data contained in each of the tables that are part of each query. SQL Server does not always find the best possible plan. It weighs up the cost of a plan, based on its estimate of the cost of resources required to execute the plan. The aim is to find a satisfactory plan in a reasonable period of time. The more complex a SQL batch is, the longer it could take SQL Server to evaluate all the possible plans that could be used to execute the batch. Finding the best plan might take longer than executing a less optimal plan. There is no need to consider alternate plans for DDL (Data Definition Language) statements, such as CREATE, ALTER or DROP. Many simple queries also have trivial plans that are quickly identified. Question: Can you think of a type of query that might lead to a trivial plan?
Query Plan Execution Once a plan is found, the execution engine and storage engine work to execute the plan. It may or may not succeed as runtime errors could occur.
Plan Caching If the plan is considered sufficiently useful, it may be stored in the Plan Cache. On later executions of the batch, SQL Server will attempt to reuse execution plans from the Plan Cache. This is not always possible and, for certain types of query, not always desirable.
Reading SQL Server 2008 R2 Execution Plans
7-7
What is an Execution Plan?
Key Points An execution plan is a map that details either how SQL Server would execute a query or how SQL Server did execute a query. SQL Server uses a cost-based optimizer.
Execution Plans Execution plans show the overall method that SQL Server is using to satisfy the requirements of the query. As part of the plan, SQL Server decides the types of operations to be performed and the order that the operations will be performed in. Many operations are related to the choice SQL Server makes about how to access data in a table and whether or not available indexes will be used. These decisions are based on the statistics that are available to SQL Server at the time. SQL Server uses a cost-based optimizer and each element of the query plan is assigned a cost in relation to the total cost of the batch. SSMS also calculates a relationship between the costs of each statements, which is useful where a batch contains more than one statement. The costs that are either estimated or calculated as part of the plan can be interpreted within the plan. The cost of individual elements can be compared across statements in a single batch but comparisons should not be made between the costs of elements in different batches. Costs can only be used to determine if an operation is cheaper or more expensive than another operation. Costs cannot be used to estimate execution time. Question: What resources do you imagine the cost would be based upon?
7-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
Actual vs. Estimated Execution Plans
Key Points SQL Server can record the plan it used for executing a query. Before it executes a query though, it needs to create an initial plan.
Actual vs. Estimated Execution Plans It is possible to ask SQL Server to return details of the execution plan used, along with results returned from a query. These plans are known as "actual" execution plans. In SQL Server Management Studio, on the Query menu, there is an option to "Include Actual Execution Plan". Once the results from a query are returned, another output tab is created that shows the execution plan that was used. Another option on the Query menu is to "Display Estimated Execution Plan". This asks SQL Server to calculate an execution plan for a query (or batch) based on how it would attempt to execute the query. This is calculated without actually executing the query. This type of plan is known as an "Estimated Execution Plan". Estimated execution plans are very useful when designing queries or when debugging queries that are suffering from performance problems. Note that it is not always possible to retrieve an estimated execution plan. One common reason for this is that the batch might include statements that create objects and then access them. As the objects do not yet exist, SQL server has no knowledge of them and cannot create a plan for processing them. You will see an example of this in the next demonstration. When SQL Server executes a plan, it may also make choices that differ from an estimated plan. This is commonly related to the available resources (or more likely the lack of available resources) at the time the batch is executed.
Reading SQL Server 2008 R2 Execution Plans
7-9
Execution plans include row counts in each data path. For estimated execution plans, these are based on estimates from the available statistics. For actual execution plans, both the estimated and actual row counts are shown.
7-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is an Execution Context?
Key Points Execution plans are reentrant. This means that more than one user can be executing exactly the same execution plan at one time. Each user needs separate data related to their individual execution of the plan. This data is held in an object known as an Execution Context.
Execution Context Execution plans detail the steps that SQL Server would take (or did take) when executing a batch of statements. When multiple users are executing the plan concurrently, there needs to be a structure that holds data related to their individual executions of the plan. Execution contexts are cached for reuse in a very similar way to the caching that occurs with execution plans. When a user executes a plan, SQL Server retrieves an execution context from cache if there is one available. To maximize performance and minimize memory requirements, execution contexts are not fully completed when they are created. Branches of the code are "fleshed out" when the code needs to move to the branch. This means that if a procedure includes a set of procedural logic statements (like the IF statement), the execution context retrieved from cache may have gone in a different logical direction and not yet have all the details required. From a caching reuse point of view, it is useful to avoid too much procedural logic in stored procedures. You should favor set-based logic instead.
Reading SQL Server 2008 R2 Execution Plans
7-11
Execution Plan Formats
Key Points There are three formats for execution plans. Text based plans are now deprecated. XML based plans should be used instead. Graphical plans render XML based plans for each of use.
Execution Plan Formats Prior to SQL Server 2005, only text-based plans were available and many tools still use this type of plan. Text based plans can be retrieved from SQL Server by executing the statement: SET SHOWPLAN_TEXT ON;
Text based execution plans were superseded by XML based plans in SQL Server 2005 and are now deprecated. They should not be used in new development work.
Plan Portability SQL Server provided a graphical rendering of execution plans to make reading text based plans easier. One challenge with this however, was that it was very difficult to send a copy of a plan to another user for review. XML plans can be saved as a .sqlplan filetype and are entirely portable. Graphical plans can be rendered from XML plans, including plans that have been received from other users. Note that graphical plans include only a subset of the information that is available from an XML plan. While it is not easy to read XML plans directly, further information can be obtained by reading the contents of the XML plan. XML plans are also ideal for programmatic access for users creating tools and utilities, as XML is relatively easy to consume programmatically in an application.
7-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Question: What impact does having SSMS associated with the .sqlplan filetype have?
Reading SQL Server 2008 R2 Execution Plans
7-13
Demonstration 1A: Viewing Execution Plans in SSMS
Key Points In this demonstration you will see how to: •
Show an estimated execution plan
•
Compare execution costs between two queries in a batch
•
Show an actual execution plan
•
Save an execution plan
Demonstration Steps 1. 2.
3. 4. 5.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_07_PRJ\6232B_07_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 11 – Demonstration 1A.sql script file. Follow the instructions contained within the comments of the script file.
Question: How do you explain that such different queries return the same plan?
7-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 2
Common Execution Plan Elements
Now that the role of execution plans is understood, along with the format of the plans, it is important to learn to interpret the plans. Execution plans can contain a large number of different types of elements. Certain elements however, appear regularly in execution plans. In this lesson, you will learn to interpret execution plans and learn about the most common execution plan elements.
Objectives After completing this lesson, you will be able to: •
Describe the execution plan elements for table and clustered index scans and seeks
•
Describe the execution plan elements for nested loops and lookups
•
Describe the execution plan elements for merge and hash joins
•
Describe the execution plan elements for aggregations
•
Describe the execution plan elements for filter and sort
•
Describe the execution plan elements for data modification
Reading SQL Server 2008 R2 Execution Plans
7-15
Table and Clustered Index Scans and Seeks
Key Points Three execution plan elements relate to reading data from a table. The particular element used depends upon the structure of the table: heap or clustered index and whether the clustered index (if present) is useful in resolving the query.
Table and Clustered Index Scans and Seeks Question: What is the difference between a table scan and a clustered index scan? Table scans are a problem in many queries. There is a common misconception that table scans are a problem but that clustered index scans are not. No doubt this relates to the word "index" in the name of the element. Table scans and clustered index scans are essentially identical except that table scans apply to heaps and clustered index scans apply to tables that are structured with clustered indexes. If a query's logic is related to the clustering key for the table, SQL Server may be able to use the index that supports it to quickly locate the row or rows required. For example, if a Customer table is clustered on a CustomerID column, consider how the following query would be executed: SELECT * FROM dbo.Customer WHERE CustomerID = 12;
SQL Server does not need to read the entire table and can use the index to quickly locate the correct customer. This is referred to as a clustered index seek.
7-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Nested Loops and Lookups
Key Points Nested Loops are one of the most commonly encountered operations. They are used to implement join operations and are commonly associated with RID or Key Lookup elements.
Nested Loops and Lookups Nested loop operations are used to implement joins. Two data paths will enter the nested loop element from the right-hand side as shown in the following screenshot:
For each row in the upper input, a lookup is performed against the bottom input. The difference between a RID Lookup and a Key Lookup is whether the table has a clustered index. RID Lookups apply to heaps. Key Lookups apply to tables with clustered indexes. In some earlier documentation, a Key Lookup was also referred to as a Bookmark Lookup. The Key Lookup operator was introduced in SQL Server 2005 Service Pack 2. Note also that in earlier versions of SQL Server 2005, the Bookmark Lookup was shown as a Clustered Index Seek operator with a LOOKUP keyword associated with it.
Reading SQL Server 2008 R2 Execution Plans
7-17
In the physical library analogy, a lookup is similar to reading through an author index and for each book found in the index, going to collect it from the bookcases. Lookups are often expensive operations as they need to be executed once for every row of the top input source. Note that in the execution plan shown, more than half the cost of the query is accounted for by the Key Lookup operator. In the next module, you will see options for minimizing this cost in some situations. Nested Loop is the preferred choice whenever the number of rows in the top input source is small when compared with the number of rows in the bottom input source.
7-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Merge and Hash Joins
Key Points Merge Joins and Hash Matches are other forms of join operations. Merge Joins are more efficient than Hash Matches but require sorted inputs.
Merge Joins Apart from Nested Loop operations in which each row of one table is used to lookup rows from another table, it is common to need to join tables where simple lookups are not possible. Imagine two piles of paper sitting on the floor of your office. One pile of paper holds details of all your customers, one customer to a sheet. The other pile of paper holds details of customer orders, one order per sheet. If you needed to merge the two piles of paper together so that each customer's sheet was adjacent to his/her orders, how would you perform the merge? The answer depends upon the order of the sheets. If the customer sheets were in customer ID order and the customer order sheets were also in customer ID order, merging the two piles would be easy. The process involved is similar to what occurs with a Merge Join operator. It can only be used when the inputs are already in the same order. Merge Joins can be used to implement a variety of join types such as left outer joins, left semi joins, left anti semi joins, right outer joins, right semi joins, right anti semi joins and unions.
Hash Matches Now imagine how you would merge the piles of customers and customer orders if the customers were in customer ID order but the customer orders were ordered by customer order number. The same problem would occur if the customer sheets were in postal code order. These situations are similar to the problem encountered by Hash Match operations. There is no easy way to merge the piles.
Reading SQL Server 2008 R2 Execution Plans
7-19
Hash Matches are using a relatively "brute force" method of joining. One input is broken into a set of "hash buckets" based on an algorithm. The other input is processed based on the same algorithm. In the analogy with the piles of paper, the algorithm could be to obtain the first digit of the customer ID. With this algorithm, ten buckets would be created. Although it may not be possible to always avoid Hash Matches in query plans, their presence is often an indication of a lack of appropriate indexing on the underlying tables.
7-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Aggregations
Key Points There are two types of Aggregate operator: Stream Aggregate and Hash Match Aggregate. Stream Aggregate operations are very efficient.
Aggregations Imagine being asked to count how many orders are present for each customer based on a list of customer orders. How would you perform this operation? Similar to the discussion on Merge Joins and Hash Matches, the answer depends on the order that the customer orders are being held in. If the customer orders are already in customer ID order, then performing the count (or other aggregation) is very easy. This is the equivalent of a Stream Aggregate operation. However, if the aggregate being calculated is based on a different attribute of the customer orders than the attribute they are sorted by, performing the calculations is much more complex. One option would be to first sort all the customer orders by customer ID, then to count all the customer orders for each customer ID. Another alternative is to process the input via a hashing algorithm like the one used for Hash Match operations. This is what SQL Server does when using a Hash Match Aggregate operation. The presence of these operations in a query plan is often (but not always) an indication of a lack of appropriate indexing on the underlying table.
Reading SQL Server 2008 R2 Execution Plans
7-21
Filter and Sort
Key Points Filter operations implement WHERE clause predicates or HAVING clause predicates. Sort operations sort input data.
Filter and Sort WHERE clauses and HAVING clauses limit the rows returned by a query. A Filter operation can be used to implement this limit. Data rows from the input are only passed to the output if they meet specified filter criteria based on the predicates in those clauses. Filter operations are typically low cost and are processed as the data passes through the element. Sort operations are often used to implement ORDER BY clauses in queries but they have other uses. For example, a Sort operation could be used to sort rows before they are passed to other operations such as Merge Joins or for performing DISTINCT or UNION operations. Sorting data rows can be an expensive operation. Unnecessary ORDER BY operations should be avoided. Not all data needs to be output in a specific order. Question: What would affect the cost of a sort operation?
7-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Data Modification
Key Points INSERT, UPDATE and DELETE operations are used to present the outcome of underlying T-SQL data modification statements. T-SQL MERGE statements can be implemented by combinations of INSERT, UPDATE and DELETE operations.
Data Modification The purpose of these operations will usually be self-evident but what might be obvious is the potential cost of these operations or the complexity that can be involved with them. A T-SQL INSERT, UPDATE or DELETE statement might involve much more than the related execution plan operation. Question: Can you think of an example where an INSERT statement in T-SQL need to perform more than an INSERT operation in an execution plan?
Reading SQL Server 2008 R2 Execution Plans
7-23
Demonstration 2A: Common Execution Plan Elements
Key Points In this demonstration you will see queries that demonstrate the most common execution plan elements.
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_07_PRJ\6232B_07_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 21 – Demonstration 2A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: Why is the plan for a simple delete so complex?
7-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Working with Execution Plans
Now that you understand the importance of execution plans and are familiar with common elements contained within the plans, consideration needs to be given to the different ways that the plans can be captured. In this lesson, you will see a variety of ways to capture plans and explore the criteria by which SQL Server decides whether or not to reuse plans. When working with execution plans, SQL Server exposes a number of dynamic management views (DMVs) that can be used to explore query plan reuse. You will also see how they are used.
Objectives After completing this lesson, you will be able to: •
Implement methods for capturing plans
•
Explain how SQL Server decides whether or not to reuse existing plans when re-executing queries
•
Use execution plan related DMVs
Reading SQL Server 2008 R2 Execution Plans
7-25
Methods for Capturing Plans
Key Points Earlier in this module you saw how to capture execution plans using SQL Server Management Studio. Other options exist for capturing plans.
Methods for Capturing Execution Plans SQL Server Management Studio (SSMS) can be used to obtain both estimated and actual execution plans. The same options have been added to Visual Studio 2010 (VS). This can help avoid the need to have two tools open when performing development against SQL Server. It is not always possible however, to load queries into SSMS or VS for analysis. Often you will need to analyze systems that are in production or to analyze queries generated by third party applications that you have no direct access to the source code of. SQL Profiler has an event: Performance events > Showplan XML that can be used to add a column to a trace. The trace will then include the actual execution plans. Caution needs to be taken with using this option as a very large trace output could be generated very quickly if appropriate filtering is not used. The overall performance of the system could be degraded. Dynamic management views provide information about recent expensive queries and missing indexes that were detected by SQL Server when creating the plan. SQL Server Activity Monitor can display the results of querying these DMVs. The SQL Server Data Collection system collects information from the DMVs, uploads it to a central database and provides a series of reports based on the data. Unlike Activity Monitor which shows recent expensive queries, the data collection system can show historical entries. This can be very useful when a user asks about a problem occurring last Tuesday morning rather than at the time the problem is occurring.
7-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: Capturing Plans in Activity Monitor
Key Points In this demonstration you will see how to use Activity Monitor to view recent expensive queries.
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_07_PRJ\6232B_07_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Follow the instructions contained within the comments of the script file.
Question: What could cause an expensive query to be removed from the Activity Monitor window?
Reading SQL Server 2008 R2 Execution Plans
7-27
Re-Executing Queries
Key Points SQL Server attempts to reuse execution plans where possible. While this is often desirable, the reuse of existing plans can be counterproductive to performance.
Re-Executing Queries Reusing query plans avoids the overhead of compiling and optimizing the queries. Some queries, however, perform poorly when executed with a plan that was generated for a different set of parameters. For example, consider a query with FromCustomerID and ToCustomerID parameters. If the value of the FromCustomerID parameter was the same as the value of the ToCustomerID parameter, an index seek based on the CustomerID might be highly selective. However, a later execution of that query where a large number of customers were requested would not be selective. This means that SQL Server would perform better if it reconsidered how to execute the query, and thus generate a new plan. You will see a further discussion on this "parameter sniffing" issue in later modules.
Usefulness of Cached Plans Even for cached plans, SQL Server may eventually decide to evict them from the cache and recompile the queries. The two main reasons for this are: •
Correctness (changes to SET options, schema changes, etc.)
•
Optimality (data has been modified enough that a new plan should be considered)
SQL Server assigns a cost to each plan that is cached, to estimate its "value". The value is a measure of how expensive the execution plan was to generate. When memory resources become tight, SQL Server will need to decide which plans are the most useful to keep. The decision to evict a plan from memory is based on this cost value and on whether or not the plan has been reused recently.
7-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Options are available to force compilation behavior of code but they should be used sparingly and where necessary. You will see a further discussion on this issue in a later module.
Reading SQL Server 2008 R2 Execution Plans
7-29
Execution Plan Related DMVs
Key Points Dynamic Management Views provide insight into the internal operations of SQL Server. Several of these views are useful when investigating execution plans. Most DMV values are reset whenever the server is restarted. Some are reset more often. View
Description
sys.dm_exec_connections
One row per user connection to the server
sys.dm_exec_sessions
One row per session, including system and user sessions
sys.dm_exec_query_stats
Query statistics
sys.dm_exec_requests
Associated with a session and providing one row per currently executing request
sys.dm_exec_sql_text()
Provides the ability to find the T-SQL code being executed for a request
sys.dm_exec_query_plan()
Provides the ability to find the execution plan associated with a request
sys.dm_exec_cached_plans
Details of cached query plans
sys.dm_exec_cached_plan_depend ent_objects()
Details of dependent objects for those plans
7-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3B: Viewing Cached Plans
Key Points In this demonstration you will see how to view cached execution plans.
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_07_PRJ\6232B_07_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 32 – Demonstration 3B.sql script file. Follow the instructions contained within the comments of the script file.
Question: No matter how quickly you execute the command to check the cache after you clear it, you would not see it empty. Why?
Reading SQL Server 2008 R2 Execution Plans
7-31
Lab 7: Reading SQL Server Execution Plans
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager.
2.
Maximize the Hyper-V Manager window.
3.
In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started:
4.
•
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
5.
In Virtual Machine Connection window, click on the Revert toolbar icon.
6.
If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete.
7.
In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
7-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
•
Click Switch User, and then click Other User.
•
Log on using the following credentials: i. ii.
User name: AdventureWorks\Administrator Password: Pa$$w0rd
8.
From the View menu, in the Virtual Machine Connection window, click Full Screen Mode.
9.
If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window.
10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_07_PRJ\6232B_07_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario You have been learning about the design of indexes. To take this learning further, you need to have a way to view how these indexes are used. In the first exercise, you will learn to view both estimated and actual execution plans. Execution plans can contain many types of elements. In the second exercise, you will learn to identify the most common plan elements and see how statements lead to these elements being used. You regularly find yourself trying to decide between different ways of structuring SQL queries. You are concerned that you aren’t always choosing the highest-performing options. If time permits, you will learn to use execution plans to compare the cost of statements in multi-statement batches.
Reading SQL Server 2008 R2 Execution Plans
7-33
Exercise 1: Actual vs. Estimated Plans Scenario In the first exercise, you will learn to view both estimated and actual execution plans. The main tasks for this exercise are as follows: 1.
Load the test script.
2.
Generate an estimated execution plan for script 7.1.
3.
View the estimated execution plan for script 7.2 using SHOWPLAN_XML.
4.
Generate the actual execution plan for script 7.3.
5.
Try to generate an estimated execution plan for script 7.4
6.
Review the actual execution plan for script 7.4.
7.
Review the execution plans currently cached in memory using script 7.5.
Task 1: Load the test script •
Load the 51 – Lab Exercise 1.sql script from Solution Explorer.
•
Change the database context to AdventureWorks2008R2.
Task 2: Generate an estimated execution plan for script 7.1 •
Generate an estimated plan for script 7.1
Task 3: View the estimated execution plan for script 7.2 using SHOWPLAN_XML •
Execute script 7.2 in SQL Server Query Analyzer.
•
Click on the returned XML and view the execution plan.
•
Right-click in the whitespace in the plan.
•
Choose Show Execution Plan XML.
•
Briefly review the XML.
•
Close the XML window and the execution plan window.
Task 4: Generate the actual execution plan for script 7.3 •
Enable the option to include actual plans, then execute script 7.3. Note the returned execution plan tab and note that the plan is identical from the previous task.
Task 5: Try to generate an estimated execution plan for script 7.4 •
Request an estimated plan for script 7.4.
•
Note the inability to create an estimated plan the reason is shown in the messages tab.
Task 6: Review the actual execution plan for script 7.4 •
Execute script 7.4 and note the returned plan
Task 7: Review the execution plans currently cached in memory using script 7.5 •
Execute script 7.5 to view the plans currently cached in memory
7-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Results: After this exercise, you have reviewed various actual and estimated query plans.
Reading SQL Server 2008 R2 Execution Plans
Exercise 2: Identify Common Plan Elements Scenario Execution plans can contain many types of elements. You will learn to identify the most common plan elements and see how statements lead to these elements being used. The main tasks for this exercise are as follows: 1.
Load the test script
2.
Explain the actual execution plan from script 7.6
3.
Explain the actual execution plan from script 7.7
4.
Explain the actual execution plan from script 7.8
5.
Explain the actual execution plan from script 7.9
6.
Explain the actual execution plan from script 7.10
7.
Explain the actual execution plan from script 7.11
8.
Explain the actual execution plan from script 7.12
9.
Explain the actual execution plan from script 7.13
10. Explain the actual execution plan from script 7.14
Task 1: Load the test script •
Load the 61 – Lab Exercise 2.sql script from Solution Explorer.
•
Change the database context to AdventureWorks2008R2.
•
Select the option to include actual execution plans from the Query menu.
Task 2: Explain the actual execution plan from script 7.6 •
Execute script 7.6.
•
Explain the plan returned based upon the existing table structure.
Task 3: Explain the actual execution plan from script 7.7 •
Execute script 7.7.
•
Explain the plan returned based upon the existing table structure.
Task 4: Explain the actual execution plan from script 7.8 •
Execute script 7.8.
•
Explain the plan returned based upon the existing table structure.
Task 5: Explain the actual execution plan from script 7.9 •
Execute script 7.9.
•
Explain the plan returned based upon the existing table structure.
Task 6: Explain the actual execution plan from script 7.10 •
Execute script 7.10.
•
Explain the plan returned based upon the existing table structure.
7-35
7-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
Task 7: Explain the actual execution plan from script 7.11 •
Execute script 7.11.
•
Compare the plan to the one returned by script 7.10.
•
Suggest a reason for the difference in plan, where the queries are almost identical. Also note the green Missing Index warning.
Task 8: Explain the actual execution plan from script 7.12 •
Execute script 7.12.
•
Explain the plan returned based upon the existing table structure.
Task 9: Explain the actual execution plan from script 7.13 •
Execute script 7.13.
•
Compare the plan to the one returned by script 7.12.
•
Suggest a reason for the difference in plan, where the queries are very similar.
Task 10: Explain the actual execution plan from script 7.14 •
Execute script 7.14.
•
Note the difference in this plan from the plan for script 7.12. Results: After this exercise, you will have analyzed the most common plan elements returned from queries.
Reading SQL Server 2008 R2 Execution Plans
7-37
Challenge Exercise 3: Query Cost Comparison (Only if time permits) Scenario You regularly find yourself trying to decide between different ways of structuring SQL queries. You are concerned that you aren’t always choosing the highest-performing options. You will learn to use execution plans to compare the cost of statements in multi-statement batches. The main tasks for this exercise are as follows: 1.
Load the test script
2.
Explain the actual execution plan from script 7.15.
Task 1: Load the test script •
Load the 71 – Lab Exercise 3.sql script from Solution Explorer.
•
Change the database context to AdventureWorks2008R2.
•
Select the option to include actual execution plans from the Query menu.
Task 2: Explain the actual execution plan from script 7.15 •
Execute script 7.15 as a single batch (both queries should be executed together).
•
Explain the execution plan that is returned. In particular, explain the relationship between the two query plans. Results: After this exercise, you have used execution plans to compare the cost of statements in multistatement batches.
7-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Review and Takeaways
Review Questions 1.
What is the difference between a graphical execution plan and an XML execution plan?
2.
Give an example of why a T-SQL DELETE statement could have a complex execution plan?
Best Practices 1.
Avoid capturing execution plans for large numbers of statements when using SQL Profiler.
2.
If you need to capture plans using Profiler, make sure the trace is filtered to reduce the number of events being captured.
Improving Performance through Nonclustered Indexes
Module 8 Improving Performance through Nonclustered Indexes Contents: Lesson 1: Designing Effective Nonclustered Indexes
8-3
Lesson 2: Implementing Nonclustered Indexes
8-10
Lesson 3: Using the Database Engine Tuning Advisor
8-18
Lab 8: Improving Performance through Nonclustered Indexes
8-25
8-1
8-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
The biggest improvements in database query performance on most systems come from appropriate use of indexing. In previous modules, you saw how to structure tables for efficiency, including the option of creating a clustered index on the table. In this module, you will see how nonclustered indexes have the potential to significantly enhance the performance of your applications and learn to use a tool that can help you design these indexes appropriately.
Objectives After completing this lesson, you will be able to: • • •
Design effective nonclustered indexes Implement nonclustered indexes Use the database engine tuning advisor to design indexes
Improving Performance through Nonclustered Indexes
8-3
Lesson 1
Designing Effective Nonclustered Indexes
Before you start to implement nonclustered indexes, you need to design them appropriately. In this module, you will learn how SQL Server structures nonclustered indexes and how they can provide performance improvements for your applications. You will also see how to find information about the indexes that have been created.
Objectives After completing this lesson, you will be able to: • • • •
Describe the concept of nonclustered indexes Explain how SQL Server structures nonclustered indexes when the underlying table is organized as a heap Explain how SQL Server structures nonclustered indexes when the underlying table is organized with a clustered index Obtain information about indexes that have been created
8-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Nonclustered Index?
Key Points You have seen how tables can be structured as heaps or have clustered indexes. Additional indexes can be created on the tables to provide alternate ways to rapidly locate required data. These additional indexes are called nonclustered indexes.
Nonclustered Indexes A table can have up to 999 non-clustered indexes. These indexes are assigned index IDs greater than 1. Non-clustered indexes can be defined on a table regardless of whether the table uses a clustered index or a heap, and are used to improve the performance of important queries. Whenever updates to key columns from the nonclustered index or updates to clustering keys on the base table are made, the nonclustered indexes need to be updated as well. This impacts the data modification performance of the system. Each additional index that is added to a table increases the work that SQL Server might need to perform when modifying the data rows in the table. Care must be taken to balance the number of indexes created against the overhead that they introduce.
Ongoing Review An application's data access patterns may change over time, particularly in enterprises where ongoing development work is being performed on the applications. This means that nonclustered indexes that are created at one point in time may need to be altered or even dropped at a later point in time, to continue to achieve high performance levels.
Physical Analogy Continuing our library analogy, nonclustered indexes are indexes that point back to the bookcases. They provide alternate ways to look up the information in the library. For example, they might allow access by author, by release date, by publisher, etc. They can also be composite indexes where you could find an index by release date within the entries for each author.
Improving Performance through Nonclustered Indexes
8-5
Nonclustered Indexes Over Heaps
Key Points Nonclustered indexes have the same B-tree structure as clustered indexes, but in the nonclustered index, the data and the index are stored separately. When the underlying table is structured as a heap, the leaf level of a nonclustered index holds Row ID pointers instead of data. By default, no data apart from the keys is stored at the leaf level.
Nonclustered Indexes Over Heaps After traversing the structure of the nonclustered index, SQL Server obtains Row ID pointers in the leaf level of the index and uses these pointers to directly access all required data pages. Multiple nonclustered indexes can be created on a table regardless of whether the table is structured as a heap or if the table has a clustered index.
Physical Analogy Based on the library analogy, a nonclustered index over a heap is like an author index pointing to books that have been stored in no particular order within the bookcases. Once an author is found in the index, the entry in the index for each book would have an address like "Bookcase 4, Shelf 3, Book 12". Note that it would be a pointer to the exact location of the book. Question: What is an upside of having the indexes point directly to RowIDs? Question: What is the downside of having multiple indexes pointing to data pages via RowID?
8-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Nonclustered Indexes Over Clustered Indexes
Key Points You have seen that the base table could be structured with a clustered index instead of a heap. While SQL Server could have been designed so that nonclustered indexes still pointed to Row IDs, it was not designed that way. Instead, the leaf levels of a nonclustered index contain the clustering keys for the base table.
Nonclustered Indexes Over Clustered Indexes After traversing the structure of the nonclustered index, SQL Server obtains clustering keys from the leaf level of the index. It then uses these keys to traverse the structure of the clustered index to locate the required data pages. Note that two sets of index traversal occur. If the clustered index was not a unique clustered index, the leaf level of the nonclustered index also needs to hold the uniqueifier value for the data rows.
Physical Analogy In the library analogy, a nonclustered index over a clustered index is like having an author index built over a library where the books are all stored in ISBN order. When the required author is found in the author index, the entry in the index provides details of the ISBNs for the required books. These ISBNs are then used to locate the books within the bookcases. If the bookcases need to be rearranged (for example due to other rows being modified), no changes need to be made to the author index as it is only providing keys that are used for locating the books, rather than the physical location of the books. Question: What is the downside of holding clustering keys in the leaf nodes of a nonclustered index instead of RowIDs? Question: What is the upside of holding clustering keys in the leaf nodes of a nonclustered index instead of RowIDs?
Improving Performance through Nonclustered Indexes
8-7
Methods for Obtaining Index Information
Key Points You might require information about existing indexes before you create, modify, or remove an index. SQL Server 2008 provides many ways to obtain information about indexes.
SQL Server Management Studio SQL Server Management Studio (SSMS) offers a variety of ways to obtain information about indexes. Object Explorer lists the indexes associated with tables. This includes indexes that have been created by users and those indexes that relate to PRIMARY KEY and UNIQUE constraints in cases where indexes have been created by SQL Server to support those constraints. Each index has a property page that details the structure of the index and of its operational, usage and physical layout characteristics. SSMS also includes a set of prebuilt reports that show the state of a database. Many of these reports relate to index structure and usage.
System Stored Procedures and Catalog Views The sp_helpindex system stored procedure returns details of the indexes created on a specified table. SQL Server provides a series of catalog views that provide information about indexes. Some of the more useful views are shown in the following table: System View
Notes
sys.indexes
Index type, filegroup or partition scheme ID, and the current setting of index options that are stored in metadata.
sys.index_columns
Column ID, position within the index, type (key or nonkey), and sort order (ASC or DESC).
8-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
System View
Notes
sys.stats
Statistics associated with a table, including statistic name and whether it was created automatically or by a user.
sys.stats_columns
Column ID associated with the statistic.
Dynamic Management Views SQL Server provides a series of dynamic management objects with useful information about the structure and usage of indexes. Some of the most useful views and functions are shown in the following table: View
Notes
sys.dm_db_index_physical_stats
Index size and fragmentation statistics.
sys.dm_db_index_operational_stats
Current index and table I/O statistics.
sys.dm_db_index_usage_stats
Index usage statistics by query type.
System Functions SQL Server provides a set of functions that provide information about the structure of indexes. Some of the more useful functions are shown in the following table: Function
Notes
INDEXKEY_PROPERTY
Index column position within the index and column sort order (ASC or DESC).
INDEXPROPERTY
Index type, number of levels, and current setting of index options that are stored in metadata.
INDEX_COL
Name of the key column of the specified index.
In the next demonstration, you will see examples of many of these methods for obtaining information on indexes.
Improving Performance through Nonclustered Indexes
8-9
Demonstration 1A: Obtaining Index Information
Key Points In this demonstration you will see several ways to view information about indexes.
Demonstration Steps 1. 2.
3. 4. 5.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_08_PRJ\6232B_08_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 11 – Demonstration 1A.sql script file. Follow the instructions contained within the comments of the script file.
Question: What would be another way to find information about the physical structure of indexes?
8-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 2
Implementing Nonclustered Indexes
Now that you have learned how nonclustered indexes are structured, it is important to learn how nonclustered indexes are implemented. In earlier modules, you have seen how Lookup functions used with Nested Loop execution plan elements can be very expensive operations. In this module, you will see options for alleviating these costs. You will also see how to alter or drop nonclustered indexes and how filtered indexes can reduce the overhead associated with some nonclustered indexes.
Objectives After completing this lesson, you will be able to: • • • •
•
Create nonclustered indexes Describe the performance impact of Lookup operations as part of Nested Loops in execution plans Use INCLUDE Clause to create covering indexes Drop or alter nonclustered indexes Use filtered indexes
Improving Performance through Nonclustered Indexes
8-11
Creating Nonclustered Indexes
Key Points Nonclustered indexes are created with the CREATE INDEX statement. By default, the CREATE INDEX statement creates nonclustered indexes rather than clustered indexes when you do not specify which type of index you require. Wherever possible, the clustered index (if the table needs one) should be created prior to the nonclustered indexes. Otherwise SQL Server needs to rebuild all nonclustered indexes while creating the clustered index.
Creating Nonclustered Indexes Creating a Nonclustered index requires supplying a name for the index, the name of the table to be indexed and the columns that need to be used to create the index key. It is important to choose an appropriate naming scheme for indexes. Many standards for naming indexes exist, along with strong opinions on which of the standards is best. The important thing is to choose a standard and follow it consistently. If an index is created only to enhance performance, rather than as part of the initial schema of an application, one suggested standard is to include in the name of the index the date of creation and a reference to documentation that describes why the index was created. Database administrators are often hesitant to remove indexes when they do not know why those indexes were created. Keeping documentation that explains why indexes were created avoids that confusion.
Composite Nonclustered Indexes A composite index specifies more than one column as the key value. Query performance can be enhanced by using composite indexes, especially when users regularly search for information in more than one way. However, wide keys increase the storage requirements of an index. The majority of useful nonclustered indexes in business applications are composite indexes. A common error is to create single column indexes on many columns of a table. These indexes are rarely useful.
8-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
In composite indexes, the ordering of key columns is important and that the most selective column should be specified first, in the absence of any other requirements. Each column that makes up the key can be specified as ASC (ascending) or DESC (descending). Ascending is the default order.
Improving Performance through Nonclustered Indexes
8-13
Performance Impact of Lookups in Nested Loops
Key Points Nonclustered indexes can be very useful when needing to find specific data based on the key columns of the index. However, for each entry found, SQL Server needs to then use the values from the leaf level of the index (either clustering keys or rowid) to look up the data rows from the base table. This lookup process can be very expensive.
Performance Impact of Lookups in Nested Loops In the example shown on the slide, note that the percentage cost breakdown. The key lookups are estimated at 95% of the cost of executing the query. In the library analogy, this is equivalent to looking up an author in an index and for each entry found, running over to the bookcase to retrieve the books pointed to by the index. There is a point at which the effort of doing this is not worthwhile and it is quicker to scan the entire library. Question: How selective would you imagine a query needs to be before SQL Server will decide to ignore the index and just scan the data? Question: Is there any situation where there is no need for the lookups?
8-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
INCLUDE Clause
Key Points In earlier versions of SQL Server (prior to 2005), it was common for DBAs or developers to create indexes with a large number of columns, to attempt to "cover" important queries. Covering a query avoids the need for lookup operations and can greatly increase the performance of queries. The INCLUDE clause was introduced to make the creation of covering indexes easier.
INCLUDE Clause Adding columns to the key of an index adds a great deal of overhead to the index structure. For example, in the library analogy, if an index was constructed on PublisherID, ReleaseDate and Title, the index would internally be sorted by Title for no benefit. A further issue is the limitation of 16 columns and 900 bytes for an index, as this limits the ability to add columns to index keys when trying to cover queries. SQL Server 2005 introduced the ability to include one or more columns (up to 1024 columns) only at the leaf level of the index. The index structure in other levels is unaffected by these included columns. They are included, only to help with covering queries. If more than one column is listed in an INCLUDE clause, the order of the columns within the clause is not relevant. Question: For an index to cover a single table query, which columns would need to be present in the index?
Performance Impacts Covering indexes can have a very positive performance impact on the queries that they are designed to support. However, while it would be possible to create an index to cover most queries, doing so could be counterproductive. Each index that is added to a table can negatively impact the performance of data modifications on the table. For this reason, it is important to decide which queries are most important and to aim to cover only those queries.
Improving Performance through Nonclustered Indexes
8-15
Dropping or Altering Nonclustered Indexes
Key Points Only indexes created via CREATE INDEX can be dropped via DROP INDEX. If an index has been created by SQL Server to support a PRIMARY KEY or UNIQUE constraint, then those indexes need to be dropped by dropping the constraint instead.
Limitations on Altering Indexes While it might at first glance seem that an index could be restructured via the ALTER INDEX statement. Changing the columns that make up the key, altering the sort order of the columns or changing settings such as FILLFACTOR and PADINDEX is not permitted. These changes can be implemented by using the CREATE INDEX statement with the WITH DROP_EXISTING option. In the example shown in the slide, an index is being disabled. Once an index is disabled, it is re-enabled by rebuilding the index. The rebuild command example shown in the example on the slide uses the ONLINE = ON option. This is only supported on Enterprise or higher editions of SQL Server. The ability to perform online index operations is one of the key reasons for purchasing these editions as many organizations no longer have available time windows for index maintenance operations.
8-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Filtered Indexes
Key Points By default, SQL Server includes an entry for every row in a table at the leaf level of each index. This is not always desirable. Filtered indexes only include rows that match a WHERE predicate that is specified when the index is created.
Filtered Indexes For the example in the slide, consider a large table of transactions with one column that indicates if the transaction is finalized or not. Often only a very small number of rows will be unfinalized. An index on the finalized transactions would be pointless as it would never be sufficiently selective to be helpful. However, an index on the unfinalized transactions could be highly selective and very useful. Standard indexes created in this situation would contain an entry at the leaf level for every transaction row, even though most entries in the index would never be used. Filtered indexes only include entries for rows that match the WHERE predicate. Note that only very simple logic is permitted in the WHERE clause predicate for filtered indexes. For example, you cannot use the clause to compare two columns and you cannot reference a computed column, even if it is persisted. Question: What is the downside of having an entry at the leaf level for every transaction row, whether finalized or not?
Improving Performance through Nonclustered Indexes
8-17
Demonstration 2A: Nonclustered Indexes
Key Points In this demonstration you will see how to: •
Create covering indexes
•
View included columns in indexes
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_08_PRJ\6232B_08_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Follow the instructions contained within the comments of the script file.
Question: If included columns only apply to nonclustered indexes, why do you imagine that the columns in the clustered primary key also showed as included?
8-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Using the Database Engine Tuning Advisor
Designing useful indexes is considered by many people as more of an art than a science. While there is some truth to this statement, a number of tools are available to assist with learning to create useful indexes. In this module, you will learn how to capture activity against SQL Server using SQL Profiler and then how to analyze that activity using the Database Engine Tuning Advisor.
Objectives After completing this lesson, you will be able to: • •
Capture traces of activity using SQL Server Profiler Use Database Engine Tuning Advisor to analyze trace results
Improving Performance through Nonclustered Indexes
8-19
SQL Server Profiler
Key Points SQL Server Profiler is an important tool when tuning the performance of SQL Server queries. It captures the activity from client applications to SQL Server and stores it in a trace. These traces can then be analyzed.
SQL Server Profiler SQL Server profiler captures data when events occur. Only events that have been selected are captured. A variety of information (shown as a set of columns) is available when each event occurs. The trace created contains only the selected columns for the selected events. Rather than needing to select events and columns each time you run SQL Server Profiler, a set of existing templates are available. You can also save your own selections as a new template. The captured traces are useful when tuning the performance of an application and when diagnosing specific problems that are occurring. When using traces for diagnosing problems, log data from the Windows Performance Monitor tool can be loaded. This allows relationships between system resource impacts and the execution of queries in SQL Server to be made. The traces can also be replayed. The ability to replay traces is useful for load testing systems or for ensuring that upgraded versions of SQL Server can be used with existing applications. SQL Server Profiler also allows you to step through queries when diagnosing problems.
SQL Trace SQL Server Profiler is a graphical tool and it is important to realize that it can have significant performance impacts on the server being traced, depending upon the options chosen. SQL Trace is a library of system stored procedures that can be used for tracing when minimizing the performance impacts of the tracing is necessary.
8-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
The Extended Events system that was introduced in SQL Server 2008 also provides capabilities for tracing SQL Server activity and resources. Both SQL Trace and Extended Events are outside the scope of this course. Question: Where would the ability to replay a trace be useful?
Improving Performance through Nonclustered Indexes
8-21
Demonstration 3A: SQL Server Profiler
Key Points In this demonstration you will see how to use SQL Server Profiler.
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: • Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. • In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_08_PRJ\6232B_08_PRJ.ssmssln and click Open. • Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 31 – Demonstration 3A.sql script file. Follow the instructions contained within the comments of the script file.
Question: When so many statements were executed, why was there only one entry in the trace?
8-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Database Engine Tuning Advisor
Key Points The Database Engine Tuning Advisor utility analyzes the performance effects of workloads run against one or more databases. Typically these workloads are obtained from traces captured by SQL Server Profiler. After analyzing the effects of a workload on your databases, Database Engine Tuning Advisor provides recommendations for improving the performance of your system.
Database Engine Tuning Advisor In SQL Server 2000 and earlier, a previous version of this tool was supplied. It was called the "Index Tuning Wizard". In SQL Server 2005, the name was changed as the tool evolved to be able to provider a broader range of recommendations. Database Engine Tuning Advisor was further enhanced in SQL Server 2008 with improved workload parsing, integrated tuning, and the ability to tune multiple databases concurrently.
Workloads A workload is a set of Transact-SQL statements that executes against databases that you want to tune. The workload source can be a file containing Transact-SQL statements, a trace file generated by SQL Profiler, or a table of trace information, again generated by SQL Profiler. SQL Server Management Studio also has the ability to launch Database Engine Tuning Advisor to analyze an individual statement.
Recommendations The recommendations that can be produced include suggested changes to the database such as new indexes, indexes that should be dropped, and depending on the tuning options you set, partitioning recommendations. The recommendations that are produced are provided as a set of Transact-SQL statements that would implement the suggested changes. You can view the Transact-SQL and save it for later review and application, or you can choose to implement the recommended changes immediately.
Improving Performance through Nonclustered Indexes
8-23
Be careful of applying changes to a database without detailed consideration, especially in production environments. Also, ensure that any analysis that you perform is based on appropriately sized workloads so that recommendations are not made based on partial information. Question: Why is it important to tune an entire workload rather than individual queries?
8-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3B: Database Engine Tuning Advisor
Key Points In this demonstration you will see how to use Database Engine Tuning Advisor.
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_08_PRJ\6232B_08_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 32 – Demonstration 3B.sql script file. Follow the instructions contained within the comments of the script file.
Question: Should you immediately apply the recommendations to your server?
Improving Performance through Nonclustered Indexes
8-25
Lab 8: Improving Performance through Nonclustered Indexes
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3.
4.
5. 6. 7.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V Manager window. In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started: •
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
In Virtual Machine Connection window, click on the Revert toolbar icon. If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete. In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
•
Click Switch User, and then click Other User.
8-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
•
Log on using the following credentials:
i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd 8. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 9. If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window. 10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_08_PRJ\6232B_08_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario The marketing system includes a query that is constantly executed and is performing too slowly. It retrieves 5000 web log entries beyond a given starting time. Previously, a non-clustered index was created on the SessionStart column. When 100 web log entries were being retrieved at a time, the index was being used. The developer is puzzled that changing the request to 5000 entries at a time has caused SQL Server to ignore the index he built. You need to investigate the query and suggest the best non-clustered index to support the query. You will then test your suggestion. After you have created the new index, the developer noted the cost of the sort operation and tried to create another index that would eliminate the sort. You need to explain to him why SQL Server has decided not to use this index. Later you will learn to set up a basic query tuning trace in SQL Server Profiler and use the trace captured in Database Engine Tuning Advisor. If time permits, you will design a required nonclustered index.
Supporting Documentation Query 1: Query to test DECLARE @StartTime datetime2 = '2010-08-30 16:27'; SELECT TOP(5000) wl.SessionID, wl.ServerID, wl.UserName FROM Marketing.WebLog AS wl WHERE wl.SessionStart >= @StartTime ORDER BY wl.SessionStart, wl.ServerID;
Query 2: Index Design CREATE INDEX IX_WebLog_Perf_20100830_B ON Marketing.WebLog (ServerID, SessionStart) INCLUDE (SessionID, UserName);
Query 3: Query to review SELECT PostalCode, Country
Improving Performance through Nonclustered Indexes
FROM Marketing.PostalCode WHERE StateCode = 'KY' ORDER BY StateCode, PostalCode;
8-27
8-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Nonclustered index usage review Scenario The marketing system includes a query that is constantly executed and is performing too slowly. It retrieves 5000 web log entries beyond a given starting time. Previously, a non-clustered index was created on the SessionStart column. When 100 web log entries were being retrieved at a time, the index was being used. The developer is puzzled that changing the request to 5000 entries at a time has caused SQL Server to ignore the index he built. You need to investigate the query and suggest the best non-clustered index to support the query. You will then test your suggestion. The main tasks for this exercise are as follows: 1. 2. 3.
4.
Review the query. Review the existing Index and Table structures. Design a more appropriate index. Test your design.
Task 1: Review the query •
Review the Query 1 in the supporting documentation.
Task 2: Review the existing Index and Table structures •
Review the existing Index and Table structures.
Task 3: Design a more appropriate index •
Design a more appropriate index.
Task 4: Test your design •
In the supporting documentation, use Query 1 to test your new index. Results: After this exercise, you have created a non-clustered index.
Improving Performance through Nonclustered Indexes
Exercise 2: Improving nonclustered index designs Scenario After you have created the new index, the developer noted the cost of the sort operation and tried to create another index that would eliminate the sort. Explain why SQL Server has decided not to use this index: The main tasks for this exercise are as follows: 1. 2. 3.
Review the index design. Implement the index. Test the design and explain why the index was not used.
Task 1: Review the index design •
In Query 2 in the supporting documentation, review the index design.
Task 2: Implement the index •
Create the index as per the index design.
Task 3: Test the design and explain why the index was not used •
Enable Include Actual Execution Plan.
•
Execute the query.
•
Review the Execution Plan and explain why the index was not used. Results: After this exercise, you have understood why some indexes are not appropriate in some scenerios.
8-29
8-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 3: SQL Server Profiler and Database Engine Tuning Advisor Scenario Query 3 is another important query. You need to investigate the query and suggest the best nonclustered index to support the query. You will then test your suggestion. The main tasks for this exercise are as follows: 1. 2. 3. 4.
Review the query. Review the existing Index and Table structures. Design a more appropriate index by following the Missing Index suggestion. Create a better index that removes the sort operation. If you create another index, confirm that SQL Server selects it.
Task 1: Review the query •
Review Query 3 in the supporting documentation.
Task 2: Review the existing Index and Table structures •
Review the existing Index and Table structures.
Task 3: Design a more appropriate index by following the Missing Index suggestion •
Review and implement the Missing Index that SQL Server has suggested.
•
Test to ensure that the new index is being used.
Task 4: Create a better index that removes the sort operation. If you create another index, confirm that SQL Server selects it •
Create a new index that will remove the Sort operation.
•
Test to ensure that the new index is being used. Results: After this exercise, you should have created a better index that will remove the sort operation.
Improving Performance through Nonclustered Indexes
Challenge Exercise 4: Nonclustered index design (Only if time permits) Scenario You will learn to set up a basic query tuning trace in SQL Server Profiler and to analyze use the trace captured in Database Engine Tuning Advisor. The main tasks for this exercise are as follows: 1. 2. 3.
Open SQL Server Profiler and configure and start a trace. Load and execute the workload file. Stop and analyze the trace using DTA.
Task 1: Open SQL Server Profiler and configure and start a trace •
Open SQL Server Profiler.
•
Configure it use the following: a.
Template: Tuning
b.
Save To File: should be selected and any file name provided for a file on the desktop
c.
Enable file rollover: Not selected
d.
Maximum File Size: 500MB
e.
Filter: DatabaseName LIKE MarketDev
•
Start the SQL Server Profiler Trace.
•
Disable AutoScroll from the Window Menu.
Task 2: Load and execute the workload file •
Load and execute the workload file 81 – Lab Exercise 4.sql.
Task 3: Stop and analyze the trace using DTA •
Stop the SQL Server Profiler trace.
•
Analyze the trace results using DTA.
•
Review the recommendations provided by the Database Tuning Advisor. Results: After this exercise, you should have created a SQL Server Profiler trace and analyzed the recommendations from the Database Tuning Advisor.
8-31
8-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Review and Takeaways
Review Questions 1. 2.
What is a covering index? Can a clustered index be a covering index?
Best Practices 1. 2. 3.
Never apply Database Engine Tuning Advisor recommendations without further reviewing what is being suggested. Record details of why and when you create any indexes. DBAs are hesitant to ever remove indexes without this knowledge. When DETA suggests new statistics, this should be taken as a hint to investigate the indexing structure of the table.If using an offline version of Books Online, ensure it is kept up to date.
Designing and Implementing Stored Procedures
Module 9 Designing and Implementing Stored Procedures Contents: Lesson 1: Introduction to Stored Procedures
9-3
Lesson 2: Working With Stored Procedures
9-11
Lesson 3: Implementing Parameterized Stored Procedures
9-23
Lesson 4: Controlling Execution Context
9-33
Lab 9: Designing and Implementing Stored Procedures
9-39
9-1
9-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
Stored procedures allow the creation of T-SQL logic that will be stored and executed at the server. This logic might enforce business rules or data consistency. You will see the potential advantages of the use of stored procedures in this module along with guidelines on creating them.
Objectives After completing this module, you will be able to: • • • •
Describe the role of stored procedures and the potential benefits of using them Work with stored procedures Implement parameterized stored procedures Control the execution context of a stored procedure
Designing and Implementing Stored Procedures
9-3
Lesson 1
Introduction to Stored Procedures
SQL Server provides a number of stored procedures and users can also create stored procedures. In this lesson, you will see the role of stored procedures and the potential benefits of using them. System stored procedures provide a large amount of pre-built functionality that you can take advantage of when building applications. It is also important to realize when designing stored procedures that not all T-SQL statements are permitted within stored procedures.
Objectives After completing this lesson, you will be able to: • • • •
Describe the role of stored procedures Identify the potential benefits of using stored procedures Work with system stored procedures Identify statements that are not permitted within the body of a stored procedure declaration
9-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What is a Stored Procedure?
Key Points A stored procedure is a named collection of Transact-SQL statements that is stored on the server within the database itself. Stored procedures are a method of encapsulating repetitive tasks; they support userdeclared variables, conditional execution, and other powerful programming features.
T-SQL Code and Logic Reuse When applications interact with SQL Server there are two basic ways that they can send commands to the server. The application could send each batch of T-SQL commands to the server to be executed and resend the same commands if the same function needs to be executed again later. Alternately, a stored procedure could be created at the server level to encapsulate all the T-SQL statements required. Stored procedures are given names that they can be referred to by. The application can then simply ask to execute the stored procedure each time it needs to use that same functionality, rather than sending all the statements that would otherwise be required.
Stored Procedures Stored procedures are similar to procedures, methods and functions in high-level languages. They can have input and output parameters and a return value. As a side effect of executing the stored procedure, rows of data can also be returned from the stored procedure. In fact, multiple rowsets can be returned from a single stored procedure. Stored procedures can be created in either T-SQL or in managed .NET code and are executed by the EXECUTE T-SQL statement. The creation of stored procedures in managed code will be discussed in Module 16. Question: Why might it be useful to return multiple rowsets from a stored procedure?
Designing and Implementing Stored Procedures
9-5
Benefits of Stored Procedures
Key Points The use of stored procedures offers a number of benefits over issuing T-SQL code directly from an application.
Security Boundary Stored procedures can be part of a scheme that helps increase application security. They can be treated as a security boundary. Users can be given permission to execute a stored procedure without being given permission to access the objects that the stored procedure accesses. For example, you can give a user (or set of users via a role) permission to execute a stored procedure that updates a table without granting the user any permissions at all directly on the table.
Modular Programming Code reuse is important. Stored procedures help by allowing logic to be created once and then to be called many times and from many applications. Maintenance is easier as if a change is needed, you only need to change the procedure, without needing to change the application code at all in many cases. Changing a stored procedure could avoid the need to change the data access logic in a group of applications.
Delayed Binding It is possible to create a stored procedure that accesses (or references) a database object that does not yet exist. This can be helpful in simplifying the order that database objects need to be created in. This is referred to as deferred name resolution.
Performance Send the name of a stored procedure to be executed rather than hundreds or thousands of lines of executable T-SQL code can offer a significant reduction in the level of network traffic.
9-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Before T-SQL code is executed, it needs to be compiled. When a stored procedure is compiled, in many cases SQL Server will attempt to retain (and reuse) the query plan that it previously generated, to avoid the cost of the compilation of the code. While reuse of execution plans for ad-hoc T-SQL code issued by applications is possible, SQL Server favors the reuse of stored procedure execution plans. Query plans for ad-hoc T-SQL statements are amongst the first items removed from memory when memory pressure is occurring. The rules governing the reuse of query plans for ad-hoc T-SQL are largely based on matching the text of the queries exactly. Any difference at all (eg: whitespace, casing, etc.) will cause a different query plan to be used, unless the difference is only a value that SQL Server decides must be the equivalent of a parameter. Stored procedures have a much higher chance of achieving query plan reuse. Question: Stored procedures can be created in any order. Why could cause the tables that are referenced by the stored procedures, to need to be created in a specific order?
Designing and Implementing Stored Procedures
9-7
Working with System Stored Procedures
Key Points SQL Server is shipped with a large amount of pre-built functionality shipped within system stored procedures and system extended stored procedures.
Type of System Stored Procedures There are two basic types of system stored procedure: system stored procedures and system extended stored procedures. Both are supplied pre-built with SQL Server. The core difference between the two is that the code for system stored procedures is written in T-SQL and is supplied in the master database installed with SQL Server, whereas the code for the system extended stored procedures is written in unmanaged native code (typically written in C++) and supplied via a DLL (dynamic link library). Note that since SQL Server 2005, the procedures are actually located in a hidden resource database rather than directly in the master database but the effect is the same. Originally, there was a basic distinction in the naming where system stored procedures had an sp_ prefix and system extended stored procedures had an xp_ prefix. Over time, the need to maintain backwards compatibility has caused a mixture of these prefixes to appear in both types. Now, most system stored procedures have an sp_ prefix and most system extended stored procedures have an xp_ prefix.
System Stored Procedures System stored procedures are "special" in that they can be executed from within any database without needing to specify the master database as part of their name. They are typically used for administrative tasks relating to configuring servers, databases and objects or for retrieving information about them. System stored procedures are created within the sys schema. Examples of system stored procedures are sys.sp_configure, sys.sp_addmessage, sys.sp_executesql.
9-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
System Extended Stored Procedures System extended stored procedures are used to extend the functionality of the server in ways that cannot be achieved via T-SQL code alone. Examples of system extended stored procedures are sys.xp_dirtree, sys.xp_cmdshell and sys.sp_trace_create. (Note how the last example here has an sp_ prefix).
User Extended Stored Procedures While it is still possible to create user-defined extended stored procedures and attach them to SQL Server, the ability to do so is now deprecated. Extended stored procedures run directly within the memory space of SQL Server. This is not a safe place for users to be executing code. User-defined extended stored procedures are well-known to the SQL Server product support group as a source of difficult problems to resolve. Managed code stored procedures should now be used instead of user-defined extended stored procedures. The use of managed code to create stored procedures will be described in Module 16.
Designing and Implementing Stored Procedures
9-9
Statements not Permitted
Key Points Not all T-SQL statements are permitted within stored procedure declarations. The table on the slide shows that statements that cannot be used.
Statements not Permitted Most T-SQL statements can be used within the bodies of stored procedures. For the statements that are not permitted, the reason usually relates to one of the following: • • •
creation of other objects changing SET options that relate to query plans changing database context via the USE statement
Note that stored procedures can access objects in other databases but that they need to be referred to by name, not by attempting to change the database context to another database ie: the USE statement cannot be used within the body of a stored procedure in the way that it can be used in a T-SQL script.
9-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Working with System Stored Procedures and Extended Stored Procedures
Key Points In this demonstration you will see how: 1. 2.
How to execute system stored procedures How to execute system extended stored procedures
Demonstration Steps 1. 2.
3. 4. 5.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_09_PRJ\6232B_09_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 11 – Demonstration 1A.sql script file. Follow the instructions contained within the comments of the script file.
Question: What does the mismatch of prefixes in system stored procedure and system extended store procedure names suggest?
Designing and Implementing Stored Procedures
Lesson 2
Working with Stored Procedures
Now that you have an understanding of why stored procedures are important, you need to gain an understanding of the practicalities involved in working with stored procedures.
Objectives After completing this lesson, you will be able to: • • • • • • •
Create a stored procedure Execute stored procedures Alter a stored procedure Drop a stored procedure Identify stored procedure dependencies Explain guidelines for creating stored procedures Obfuscate stored procedure definitions
9-11
9-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating a Stored Procedure
Key Points The T-SQL CREATE PROCEDURE statement is used to create new procedures.
CREATE PROC CREATE PROCEDURE is commonly abbreviated to CREATE PROC. A procedure cannot be replaced by using the CREATE PROC statement. It needs to be explicitly altered using an ALTER PROC statement or dropped and then recreated. The CREATE PROC statement must be the only statement in the T-SQL batch. All statements from the keyword AS until the end of the script or until the end of the batch (using a batch separator such as GO) will become part of the body of the stored procedure. Creating a stored procedure requires both the CREATE PROCEDURE permission in the current database and the ALTER permission on the schema in which the procedure is being created. It is important to keep connection settings such as QUOTED_IDENTIFIER and ANSI_NULLS consistent when working with stored procedures. The settings associated with the stored procedure are taken from the settings in the session where it is created. Stored procedures are always created in the current database with the single exception of stored procedures created with a # prefix in their name. The # prefix on a name indicates that it is a temporary object. As such, it would be created in the tempdb database and removed the next time the server is restarted.
Debugging Stored Procedures A good practice when working with stored procedures is to first write and test the T-SQL statements that you want to include in your stored procedure and if you receive the results you expected, then wrap the T-SQL statements as the body of a stored procedure.
Designing and Implementing Stored Procedures
Note: Even though wrapping the body of a stored procedure with a BEGIN…END block is not required, doing so is considered a good practice.
9-13
9-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Executing Stored Procedures
Key Points The T-SQL EXECUTE statement is used to execute stored procedures. The EXECUTE statement is commonly abbreviated as EXEC.
EXECUTE Statement The EXECUTE is mostly used to execute stored procedures but can also be used to execute other objects such as dynamic SQL statements. AS mentioned in the first lesson, system stored procedures can be executed within the master database without having to explicitly refer to that database. That does not apply to other stored procedures.
Two-Part Naming on Referenced Objects It is very important when creating stored procedures to use at least two-part names for objects referenced by the stored procedure. If you refer to a table by both its schema name and its table name, you avoid any ambiguity about which table you are referring to and you maximize the chance of SQL Server being able to reuse query execution plans for the stored procedure. If you use only the name of a table, SQL Server will first search in your default schema for the table and then if it did not locate a table with that name, it will search the dbo schema for a table with that name. This minimizes SQL Server's options for query plan reuse because it does not know until the moment the stored procedure is executed which objects it needs to refer to as different users can have different default schemas.
Two-Part Naming When Creating Stored Procedures If you create a stored procedure by only supplying the name of the procedure (and not the schema name as well), SQL Server will attempt to create the stored procedure in your default schema. Scripts that create stored procedures this way tend to be fragile as the location of the created stored procedure would depend upon the default schema of the user executing the script.
Designing and Implementing Stored Procedures
9-15
Two-Part Naming When Executing Stored Procedures When you execute a stored procedure, you should also supply the name of both the schema and the stored procedure. If you supply only the name of the stored procedure, SQL Server can end up trying to find the stored procedure in a number of places: If the stored procedure name starts with sp_ (not recommended for user stored procedures), SQL Server also looks in the master database in the sys schema for the stored procedure. • •
SQL Server then looks in the default schema for the user executing the stored procedure. SQL Server then looks in the dbo schema in the current database for the stored procedure.
Having SQL Server perform unnecessary steps to locate a stored procedure reduces performance for no reason.
9-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Altering a Stored Procedure
Key Points The T-SQL ALTER PROCEDURE statement is used to replace an existing procedure. ALTER PROCEDURE is often abbreviated to ALTER PROC.
ALTER PROC The main reason for using the ALTER PROC statement is to retain any existing permissions on the procedure while it is being changed. Users may have been granted permission to execute the procedure. If you drop the procedure and recreate it, those permissions that had been granted to the users would be removed when the procedure was dropped.
Procedure Type Note that the type of procedure cannot be changed. For example, a T-SQL procedure cannot be changed to a managed code procedure via an ALTER PROCEDURE statement or vice-versa.
Connection Settings The connection settings such as QUOTED_IDENTIFIER and ANSI_NULLS that will be associated with the modified stored procedure will be those taken from the session that makes the change, not from the original stored procedure so it is important to keep these consistent when making changes.
Complete Replacement Note that when you alter a stored procedure, you need to supply again any options (such as WITH ENCRYPTION) that were supplied while creating the procedure. None of these options are retained and they are replaced by whatever options are supplied in the ALTER PROC statement.
Designing and Implementing Stored Procedures
9-17
Dropping a Stored Procedure
Key Points Dropping a stored procedure is straightforward. The DROP PROCEDURE statement is used to drop a stored procedure and is commonly abbreviated as DROP PROC.
sys.procedures System View You can see a list of existing procedures in the current database by querying the sys.procedures view will show you the list of procedures in the current database.
Permissions Dropping a procedure requires either ALTER permission on the schema that the procedure is part of or CONTROL permission on the procedure itself.
9-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Stored Procedure Dependencies
Key Points It is a good idea before dropping a stored procedure to check for any other objects that are dependent upon the stored procedure.
sp_depends Earlier versions of SQL Server used the sp_depends system stored procedure to return details of dependencies between objects. It was known to have issues and to report incomplete information due to issues with deferred name resolution.
sys.sql_expression_dependencies Use of the sys.sql_expression_dependencies view replaces the previous use of the sp_depends system stored procedure. Sys.sql_expression_dependencies provide a one row per name dependency on userdefined entities in the current database. sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities provide more targeted views over the data provided by sys.sql_expression_dependencies. You will see an example of these dependency views being used in the next demonstration.
Designing and Implementing Stored Procedures
9-19
Guidelines for Creating Stored Procedures
Key Points There are a number of important guidelines that should be considered when creating stored procedures.
Qualify names inside of stored procedures Earlier in this lesson, the importance of using at least two part naming when referring to objects within stored procedure was described. This applied to both the creation of stored procedures and to their execution.
Keep consistent SET Options The Database Engine saves the settings of both SET QUOTED_IDENTIFIER and SET ANSI_NULLS when a Transact-SQL stored procedure is created or altered. These original settings are used when the stored procedure is executed.
Apply consistent naming conventions It is recommended that you do not create any stored procedures using sp_ as a prefix. SQL Server uses the sp_ prefix to designate system stored procedures. The name you choose may conflict with some future system procedure. It is important to have a consistent way of naming your stored procedures. For example, some people use a naming convention of a table name followed by an action. However, this does not work well for more complex procedures that affect multiple tables. Others use an action verb followed by a description of the action to be performed. There is no right or wrong way to do this in all situations but you should decide on a method for naming to be used in your applications and to consistently apply it. It is possible via Policy Based Management (first introduced in SQL Server 2008 and beyond the scope of this course) or via DDL triggers (first introduced in SQL Server 2005 and also beyond the scope of this course) to enforce naming conventions on most objects.
9-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Use @@nestlevel to see current nesting level Stored procedures are nested when one stored procedure calls another or executes managed code by referencing a CLR routine, type, or aggregate. You can nest stored procedures and managed code references up to 32 levels. You can use @@nestlevel for checking the nesting level of the current stored procedure execution on the local server.
Keep one procedure per task Avoid writing "the procedure to rule them all" (with apologies to Lord of the Rings). Don't write one procedure that does an enormous number of tasks. Doing this limits reuse possibilities and can hinder performance.
Designing and Implementing Stored Procedures
9-21
Obfuscating Stored Procedure Definitions
Key Points SQL Server provides an option to obfuscate the definition of stored procedures via the WITH ENCRYPTION option. Caution needs to be exhibited in using it as it makes working with the application more difficult and likely does not achieve the aims it is being targeted at.
WITH ENCRYPTION As was mentioned in an earlier module dealing with views, it is very important to understand that while SQL Server provides an option (WITH ENCRYPTION) to obfuscate the definition of your stored procedures, that the encryption is not particularly strong. It is known to be relatively easy to defeat as the encryption keys are stored in known locations within the encrypted text. There are a number of 3rd party tools that are capable of reversing the encryption. Original copies of the source need to be kept regardless of the fact that decryption might be possible. Do not depend upon this. Encrypted code is much harder to work with in terms of diagnosing and tuning performance issues. Question: Why might you want to obfuscate the definition of a stored procedure?
9-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 2A: Stored Procedures
Key Points In this demonstration you will see: • • • • •
How to create a stored procedure How to execute a stored procedure How to create a stored procedure that returns multiple rowsets How to alter a stored procedure How to view the list of stored procedures
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_09_PRJ\6232B_09_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Follow the instructions contained within the comments of the script file.
Question: How could the GetBlueProductsAndModels stored procedure be made more useful?
Designing and Implementing Stored Procedures
9-23
Lesson 3
Implementing Parameterized Stored Procedures
The stored procedures that you have seen earlier in this module have not involved parameters. They have produced their output without needing any input from the user and they have not returned any values back apart from the rows that they have returned. Stored procedures are more flexible when you include parameters as part of the procedure definition because you can create more generic application logic. Stored procedures can use both input and output parameters and return values. While the reuse of query execution plans is in general desirable, there are situations where this reuse is detrimental. You will see situations where this can occur and consider options for workarounds to avoid the detrimental outcomes.
Objectives After completing this lesson, you will be able to: • • • •
Parameterize stored procedures Work with input parameters Work with output parameters Explain the issues surrounding parameter sniffing and performance and the potential workarounds.
9-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Working with Parameterized Stored Procedures
Key Points Parameterized stored procedures allow for a much higher level of code reuse. They contain 3 major components: input parameters, output parameters and return values.
Input Parameters Parameters are used to exchange data between stored procedures and functions and the application or tool that called the stored procedure or function. They allow the caller to pass a data value to the stored procedure or function. To define a stored procedure that accepts input parameters, you declare one or more variables as parameters in the CREATE PROCEDURE statement. You will see an example of this in the next topic.
Output Parameters Output parameters allow the stored procedure to pass a data value or a cursor variable back to the caller. A key difference between stored procedures and user-defined function are that user-defined functions cannot specify output parameters. (User-defined functions are discussed in a later module). To use an output parameter within Transact-SQL, you must specify the OUTPUT keyword in both the CREATE PROCEDURE and the EXECUTE statements.
Return Value Every stored procedure returns an integer return code to the caller. If the stored procedure does not explicitly set a value for the return code, the return code is 0 if no error occurs, otherwise a negative value is returned. Return values are commonly used to return a status result or an error code from a procedure and are sent by the T-SQL RETURN statement.
Designing and Implementing Stored Procedures
9-25
While it is possible to send a business-logic related value via a RETURN statement, in general, you should use OUTPUT parameters to output values rather than the RETURN value.
9-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Using Input Parameters
Key Points Stored procedures can accept input parameters, similar to the way that parameters are passed to functions or methods or subroutines in higher-level languages.
Input Parameters Stored procedure parameters must have an @ prefix and must have a data type specified. The data type will be checked when a call is made. There are two ways to call a stored procedure with input parameters. One is to pass the parameter list in order. The other is to pass the list by name. You cannot combine these two options in a single EXEC call.
Default Values Provide default values for a parameter where appropriate. If a default is defined, a user can execute the stored procedure without specifying a value for that parameter. Look at the beginning of the procedure declaration from the example on the slide: CREATE PROCEDURE Sales.OrdersByDueDateAndStatus @DueDate datetime, @Status tinyint = 5 AS
Two parameters have been defined (@DueDate and @Status). The @DueDate parameter has no default value and must be supplied when the procedure is executed. The @Status parameter has a default value of 5. If a value for the parameter is not supplied when the stored procedure is executed, then a value of 5 will be used.
Validating Input Parameters As a best practice validate all incoming parameter values at the beginning of a stored procedure to trap missing and invalid values early. This might include checking whether the parameter is NULL.
Designing and Implementing Stored Procedures
9-27
Validating parameters early avoids doing substantial work in the procedure and then having to undo all that work.
Executing a Stored Procedure with Input Parameters Three examples of executing the stored procedure are shown on the slide. Look at the first option: EXEC Sales.OrdersByDueDateAndStatus '20050713',5;
This execution supplies a value for both @DueDate and for @Status. Note that the names of the parameter have not been mentioned. SQL Server knows which parameter is which by its position in the parameter list. Now look at the second option: EXEC Sales.OrdersByDueDateAndStatus '20050713';
In this case, a value for the @DueDate parameter has been supplied but no value for the @Status parameter has been supplied. In this case, the procedure will be executed with the @Status value set at a default value of 5. Finally, look at the third option: EXEC Sales.OrdersByDueDateAndStatus @DueDate = '20050713', @Status = 5;
In this case, the stored procedure is being called with both parameters but they are being identified by name. Note that you could have also achieved the same outcome with the parameters specified in the reverse order as they are identified by name: EXEC Sales.OrdersByDueDateAndStatus @Status = 5, @DueDate = '20050713';
9-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Using Output Parameters
Key Points Output parameters are used very similarly to input parameters. While they are declared and used very similarly to input parameters, output parameters have a few special requirements.
Output Parameter Requirements • •
The keyword OUTPUT must be specified when declaring the output parameters of the stored procedure. The keyword OUTPUT must also be specified in the list of parameters passed during the EXEC statement.
Look at the beginning of the procedure declaration from the example on the slide: CREATE PROC Sales.GetOrderCountByDueDate @DueDate datetime, @OrderCount int OUTPUT AS
In this case, the @DueDate parameter is an input parameter and the @OrderCount parameter has been specified as an output parameter. Note that in SQL Server there is no true equivalent of a .NET output parameter. SQL Server output parameter are really input/output parameters. Now look at how the procedure is called: DECLARE @DueDate datetime = '20050713'; DECLARE @OrderCount int; EXEC Sales.GetOrderCountByDueDate @DueDate, @OrderCount OUTPUT; SELECT @OrderCount;
First, variables to hold the parameter values have been declared. In this case, a variable to hold a due date has been declared, along with another to hold the order count.
Designing and Implementing Stored Procedures
9-29
In the EXEC call, note that the @OrderCount parameter is followed by the OUTPUT keyword. If you do not specify the OUTPUT parameter in the EXEC statement, the stored procedure would still execute as normal, including preparing a value to return in the output parameter. However, the output parameter value would simply not be copied back into the @OrderCount variable. This is a common bug when working with output parameters. Finally, you would then use the returned value somehow in the business logic that follows the EXEC call. Question: Why might you use output parameters in conjunction with IDENTITY columns?
9-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Parameter Sniffing and Performance
Key Points In general, the reuse of query plans when a stored procedure is re-executed is a good thing. Sometimes however, a stored procedure would benefit from an entirely different query execution plan for different parameter values.
Parameter Sniffing It has been mentioned that SQL Server attempts to reuse query execution plans from one execution of a stored procedure to the next. While this is mostly helpful, imagine a procedure that takes a range of names as parameters. If you ask for the rows from A to A, you might need a very different query plan to the times when you ask for A to Z. SQL Server provides a variety of ways to deal with this problem, often called a "parameter sniffing" problem. Note that this only applies to parameters and not to variables within the batch. While the code for these looks very similar, variable values are not "sniffed" at all and this can lead to poor execution plans regardless.
WITH RECOMPILE You can add a WITH RECOMPILE option when declaring a stored procedure. This causes the procedure to be recompiled every time it is executed.
sp_recompile System Stored Procedure If you call sp_recompile, any existing plans for the stored procedure passed to it will be marked as invalid and the procedure will be recompiled next time it is executed. You can also pass the name of a table or view to this procedure. In that case, all existing plans that reference the object will be invalidated and recompiled the next time they are executed.
Designing and Implementing Stored Procedures
EXEC WITH RECOMPILE If you add WITH RECOMPILE to the EXEC statement, SQL Server will recompile the procedure before running it and will not store the resulting plan.
OPTIMIZE FOR There is a query hint OPTION (OPTIMIZE FOR) that allows you to specify the value of a parameter that should be assumed when compiling the procedure, regardless of the actual value of the parameter. Question: How would you determine the value to assign in an OPTIMIZE FOR hint?
9-31
9-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: Stored Procedure Parameters
Key Points In this demonstration you will see: • •
How to create a stored procedure with parameters How to alter a stored procedure with parameters to correct a common stored procedure bug
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_09_PRJ\6232B_09_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 31 – Demonstration 3A.sql script file. Follow the instructions contained within the comments of the script file.
Question: Why do we need to treat NULL differently to other possible values?
Designing and Implementing Stored Procedures
9-33
Lesson 4
Controlling Execution Context
Stored procedures normally execute in the security context of the user calling the procedure. As long as a chain of ownership extends from the stored procedure to the objects that are referenced, the user can execute the procedure without the need for permissions on the underlying objects. Ownership chaining issues with stored procedures are identical to those that were discussed for views in Module 4. Sometimes, however, more precise control over the security context that the procedure is executing in is required.
Objectives After completing this lesson, you will be able to: • • •
Control execution context Work with the EXECUTE AS clause View execution context
9-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Controlling Execution Context
Key Points The security context that a stored procedure executes in is referred to as its execution context. This context is used to establish the identity against which permissions to execute statements or perform actions are checked.
Execution Contexts An execution context is represented by a login token and a user token. The tokens identify the primary and secondary principals against which permissions are checked and the source used to authenticate the token. A login connecting to an instance of SQL Server has one login token and one or more user tokens, depending on the number of databases to which the account has access.
User and Login Security Tokens A security token for a user or login contains the following: • • • •
One server or database principal as the primary identity One or more principals as secondary identities Zero or more authenticators The privileges and permissions of the primary and secondary identities
Login Token: A login token is valid across the instance of SQL Server. It contains the primary and secondary identities against which server-level permissions and any database-level permissions associated with these identities are checked. The primary identity is the login itself. The secondary identity includes permissions inherited from rules and groups. User Token: A user token is valid only for a specific database. It contains the primary and secondary identities against which database-level permissions are checked. The primary identity is the database user itself. The secondary identity includes permissions inherited from database roles. User tokens do not contain server-role memberships and do not honor the server-level permissions granted to the identities in the token including those that are granted to the server-level public role.
Designing and Implementing Stored Procedures
Controlling Execution Context While the default behavior of execution contexts is usually appropriate, there are times when it is desirable to execute within a different security context. In the example shown in the slide, a WITH EXECUTE AS 'Pat' clause has been added to the definition of the stored procedure. This causes the procedure to be executed with 'Pat' as the security context rather than with the default security context supplied by the caller of the stored procedure. Question: What is an authenticator?
9-35
9-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
The EXECUTE AS Clause
Key Points The EXECUTE AS clause sets the execution context of a stored procedure. It is useful when you need to override the default security context.
Explicit Impersonation SQL Server supports the ability to impersonate another principal either explicitly by using the stand-alone EXECUTE AS statement, or implicitly by using the EXECUTE AS clause on modules. The stand-alone EXECUTE AS statement can be used to impersonate server-level principals, or logins, by using the EXECUTE AS LOGIN statement. The stand-alone EXECUTE AS statement can also be used to impersonate database level principals, or users, by using the EXECUTE AS USER statement. To execute as another user, you must first have IMPERSONATE permission on that user. Any login in the sysadmin role has IMPERSONATE permission on all users.
Implicit Impersonation Implicit impersonations are performed through the WITH EXECUTE AS clause on modules and are used to impersonate the specified user or login at the database or server level. This impersonation depends on whether the module is a database-level module, such as a stored procedure or function, or a server-level module, such as a server-level trigger. When impersonating a principal by using the EXECUTE AS LOGIN statement, or within a server-scoped module by using the EXECUTE AS clause, the scope of the impersonation is server-wide. This means that after the context switch, any resource within the server that the impersonated login has permissions on can be accessed. However, when impersonating a principal by using the EXECUTE AS USER statement, or within a databasescoped module by using the EXECUTE AS clause, the scope of impersonation is restricted to the database by default. This means that references to objects outside the scope of the database will return an error.
Designing and Implementing Stored Procedures
9-37
Viewing Execution Context
Key Points You may wish to programmatically query the current security context details. These details are provided by the sys.login_token and sys.user_token system views.
sys.login_token System View The sys.login_token system view shows all tokens associated with the login. This will include the login itself and the roles that the user is a member of. In the example shown in the slide, the Windows login GREGW701\Greg has associated server tokens for the public and sysadmin roles.
sys.user_token System View The sys.user_token system view shows all tokens associated with the user within the database. In the example shown in the slide, the user is a member of the dbo role for the current database.
9-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 4A: Viewing Execution Context
Key Points In this demonstration you will see: • • •
How to view details of execution context How to change execution context for a session How to use the WITH EXECUTE AS clause in a stored procedure
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_09_PRJ\6232B_09_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 41 – Demonstration 4A.sql script file. Follow the instructions contained within the comments of the script file.
Designing and Implementing Stored Procedures
9-39
Lab 9: Designing and Implementing Stored Procedures
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3.
4.
5. 6. 7.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V Manager window. In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started: •
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
In Virtual Machine Connection window, click on the Revert toolbar icon. If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete. In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
•
Click Switch User, and then click Other User.
9-40
Implementing a Microsoft® SQL Server® 2008 R2 Database
•
Log on using the following credentials:
i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd 8. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 9. If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window. 10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_09_PRJ\6232B_09_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario You need to create a set of stored procedures to support a new reporting application. The procedures will be created within a new Reports schema.
Supporting Documentation Stored Procedure
Reports.GetProductColors
Input Parameters:
None
Output Parameters:
None
Output Columns:
Color (from Marketing.Product)
Output Order:
Color
Notes:
Colors should not be returned more than once in the output. NULL values should not be returned.
Stored Procedure
Reports.GetProductsAndModels
Input Parameters:
None
Output Parameters:
None
Output Columns:
ProductID, ProductName, ProductNumber, SellStartDate, SellEndDate and Color (from Marketing.Product), ProductModelID (from Marketing.ProductModel), EnglishDescription, FrenchDescription, ChineseDescription.
Output Order:
ProductID, ProductModelID
Notes:
For descriptions, return the Description column from the Marketing.ProductDescription table for the appropriate language. The
Designing and Implementing Stored Procedures
Stored Procedure
Reports.GetProductColors LanguageID for English is 'en', for French is 'fr' and for Chinese is 'zh-cht'. If no specific language description is available, return the invariant language description if it is present. The LanguageID for the invariant language is a blank string ''. Where neither the specific language or invariant language descriptions exist, return the ProductName instead.
Stored Procedure
Reports.GetProductsByColor
Input Parameters:
@Color (same datatype as the Color column in the Marketing.Product table)
Output Parameters:
None
Output Columns:
ProductID, ProductName, ListPrice (returned as a column named Price), Color, Size and SizeUnitMeasureCode (returned as a column named UnitOfMeasure) (from Marketing.Product)
Output Order:
ProductName
Notes:
The procedure should return products that have no Color if the parameter is NULL.
Input Parameters:
None
Output Parameters:
None
Output Columns:
Color (from Marketing.Product)
Output Order:
Color
Notes:
Colors should not be returned more than once in the output. NULL values should not be returned.
9-41
9-42
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Create stored procedures Scenario In this exercise, you will create two stored procedures to support one of the new reports. The main tasks for this exercise are as follows: 1. 2. 3. 4.
Review the Reports.GetProductColors stored procedure specification Design, create and test the Reports.GetProductColors stored procedure Review the Reports.GetProductsAndModels stored procedure specification Design, create and test the Reports.GetProductsAndModels stored procedure
Task 1: Review the Reports.GetProductColors stored procedure specification •
Review the Reports.GetProductColors specification in the supporting documentation
Task 2: Design, create and test the Reports.GetProductColors stored procedure •
Design, create and test the stored procedure based on the specifications given in the supporting documentation for the exercise
Task 3: Review the Reports.GetProductsAndModels stored procedure specification •
Review the second specification (Reports.GetProductsAndModels)in the supporting documentation
Task 4: Design, create and test the Reports.GetProductsAndModels stored procedure •
Design, create and test the stored procedure based on the second specifications given in the supporting documentation for the exercise Results: After this exercise, you should have created two new stored procedures. Tests should have shown that they are working as expected.
Designing and Implementing Stored Procedures
9-43
Exercise 2: Create a parameterized stored procedure Scenario In this exercise, you will create another stored procedure that takes parameters. The main tasks for this exercise are as follows: 1. 2.
Review the Reports.GetProductsByColor stored procedure specification Design, create and test the Reports.GetProductsByColor stored procedure
Task 1: Review the Reports.GetProductsByColor stored procedure specification •
Review the Reports.GetProductsByColor specification in the supporting documentation
Task 2: Design, create and test the Reports.GetProductsByColor stored procedure •
Design, create and test the Reports.GetProductsByColor stored procedure based on the specifications given in the supporting documentation for the exercise Results: After this exercise, you should have created a new stored procedure that takes parameters. Tests should have shown that it is working as expected.
9-44
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 3: Alter the execution context of stored procedures (Only if time permits) Scenario In this exercise, you will alter the stored procedures to use a different execution context. The main tasks for this exercise are as follows: 1. 2. 3.
Alter the Reports.GetProductColors stored procedure to execute as OWNER. Alter the Reports.GetProductsAndModels stored procedure to execute as OWNER. Alter the Reports.GetProductsByColor stored procedure to execute as OWNER.
Task 1: Alter the Reports.GetProductColors stored procedure to execute as OWNER •
Alter the Reports.GetProductColors stored procedure to execute as OWNER and test that the procedure still works.
Task 2: Alter the Reports.GetProductsAndModels stored procedure to execute as OWNER •
Alter the Reports.GetProductsAndModels stored procedure to execute as OWNER and test that the procedure still works.
Task 3: Alter the Reports.GetProductsByColor stored procedure to execute as OWNER •
Alter the Reports.GetProductsByColor stored procedure to execute as OWNER and test that the procedure still works. Results: After this exercise, you should have altered the stored procedures to execute as OWNER. Tests should have shown that they are working as expected.
Designing and Implementing Stored Procedures
9-45
Module Review and Takeaways
Review Questions 1. 2.
What does the WITH RECOMPILE option do when used with a CREATE PROC statement? What does the WITH RECOMPILE option do when used with an EXECUTE statement?
Best Practices 1. 2. 3.
Use the EXECUTE AS clause to override the execution context of stored procedures that use dynamic SQL, rather than granting permissions on the underlying tables to users. Design procedures to perform individual tasks. Avoid designing procedures that perform a large number of tasks, unless those tasks are performed by executing other stored procedures. Keep consistent ownership of stored procedures, views, tables and other objects within databases.
9-46
Implementing a Microsoft® SQL Server® 2008 R2 Database
Merging Data and Passing Tables
Module 10 Merging Data and Passing Tables Contents: Lesson 1: Using the MERGE Statement
10-3
Lesson 2: Implementing Table Types
10-14
Lesson 3: Using TABLE Types As Parameters
10-22
Lab 10: Passing Tables and Merging Data
10-26
10-1
10-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
Each time a client application makes a call to a SQL Server system, considerable delay is encountered at the network layer. The basic delay is unrelated to the amount of data being passed. It relates to the latency of the network. For this reason, it is important to minimize the number of times that a client needs to call a server for a given amount of data that must be passed between them. Each call is termed a "roundtrip". In this module you will review the techniques that provide the ability to process sets of data rather than individual rows. You will then see how these techniques can be used in combination with TABLE parameter types to minimize the number of required stored procedure calls in typical applications.
Objectives After completing this lesson, you will be able to: • • •
Use the MERGE statement Implement table types Use TABLE types as parameters
Merging Data and Passing Tables
10-3
Lesson 1
Using the MERGE Statement
A very common requirement when coding in T-SQL is the need to update a row if it exists but to insert the row if it does not already exist. SQL Server 2008 introduced the MERGE statement that provides this ability plus the ability to process entire sets of data rather than processing row by row or in several separate set-based statements. This leads to much more efficient execution and simplifies the required coding. In this lesson, you will investigate the use of the MERGE statement and the use of the most common options associated with the statement.
Objectives After completing this lesson, you will be able to: • • • • •
•
Explain the role of the MERGE statement Describe how to use the WHEN MATCHED clause Describe how to use the WHEN NOT MATCHED BY TARGET clause Describe how to use the WHEN NOT MATCHED BY SOURCE clause Explain the role of the OUTPUT clause and $action Describe MERGE determinism and performance
10-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
MERGE Statement
Key Points The MERGE statement is most commonly used to insert data that does not already exist but to update the data if it does exist. It can operate on entire sets of data rather than just on single rows and can perform alternate actions such as deletes.
MERGE It is a common requirement to need to update data if it already exists but to insert it if it does not already exist. Some other database engines (not SQL Server) provide an UPSERT statement for this purpose. The MERGE statement provided by SQL Server is a more capable replacement for such statements in other database engines and is based on the ANSI SQL standard together with some Microsoft extensions to the standard. A typical situation where the need for the MERGE statement arises is in the population of data warehouses from data in source transactional systems. For example, consider a data warehouse holding details of a customer. When a customer row is received from the transactional system, it needs to be inserted into the data warehouse. When later updates to the customer are made, the data warehouse would then need to be updated.
Atomicity Where statements in other languages typically operate on single rows, the MERGE statement in SQL Server can operate on entire sets of data in a single statement execution. It is important to realize that the MERGE statement functions as an atomic operation in that all inserts, updates or deletes occur or none occur.
Source and Target The MERGE statement uses two table data sources. The target table is the table that is being modified and is specified first in the MERGE statement. Any inserts, updates or deletes are applied only to the target table.
Merging Data and Passing Tables
10-5
The source table provides the rows that need to be matched to the rows in the target table. You can think of the source table as the incoming data. It is specified in a USING clause. The source table does not have to be an actual table but can be other types of expressions that return a table such as: • • • •
A view A sub-select (or derived table) with an alias A common table expression (CTE) A VALUES clause with an alias
The source and target are matched together as the result of an ON clause. This can involve one or more columns from both tables.
10-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
WHEN MATCHED
Key Points The WHEN MATCHED clause defines the action to be taken when a row in the source is matched to a row in the target.
WHEN MATCHED The ON clause is used to match source rows to target rows. The WHEN MATCHED clause specifies the action that needs to occur when a source row matches a target row. In most cases, this will involve an UPDATE statement but it could alternately involve a DELETE statement. In the example shown in the slide, rows in the EmployeeUpdate table are being matched to rows in the Employee table based upon the EmployeeID. When a source row matches a target row, the FullName and EmploymentStatus columns in the target table are updated with the values of those columns in the source. Note that only the target table can be updated. If an attempt is made to modify any other table, a syntax error is returned.
Multiple Clauses It is also possible to include two WHEN MATCHED clauses such as shown in the following code block: WHEN MATCHED AND s.Quantity > 0 ... WHEN MATCHED ...
No more than two WHEN MATCHED clauses can be present. When two clauses are used, the first clause must have an AND condition. If the source row matches the target and also satisfies the AND condition, then the action specified in the first WHEN MATCHED clause is performed. Otherwise, if the source row
Merging Data and Passing Tables
10-7
matches the target but does not satisfy the AND condition, the condition in the second WHEN MATCHED clause is evaluated instead. When two WHEN MATCHED clauses are present, one action must specify an UPDATE and the other action must specify a DELETE. Question: What is different about the UPDATE statement in the example shown, compared to a normal UPDATE statement?
10-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
WHEN NOT MATCHED BY TARGET
Key Points The WHEN NOT MATCHED BY TARGET clause specifies the action that needs to be taken when a row in the source cannot be matched to a row in the target.
WHEN NOT MATCHED The next clause in the MERGE statement that you will consider is the WHEN NOT MATCHED BY TARGET statement. It was mentioned in the last topic that the most common action performed by a WHEN MATCHED clause is to update the existing row in the target table. The most common action performed by a WHEN NOT MATCHED BY TARGET clause is to insert a new row into the target table. In the example shown in the slide, when a row from the EmployeeUpdate table cannot be found in the Employee table, a new employee row would be added into the Employee table. With a standard INSERT statement in T-SQL, the inclusion of a column list is considered a best practice and avoids issues related to changes to the underlying table such as the reordering of columns or the addition of new columns. The same recommendation applies to an INSERT action within a MERGE statement. While a column list is optional, best practice suggests including one.
Syntax The words BY TARGET are optional and are often omitted. The clause is then just written as WHEN NOT MATCHED. Note again that no table name is included in the action statement (INSERT statement) as modifications may only be made to the target table. The WHEN NOT MATCHED BY TARGET clause is part of the ANSI SQL standard.
Merging Data and Passing Tables
10-9
WHEN NOT MATCHED BY SOURCE
Key Points The WHEN NOT MATCHED BY SOURCE statement is used to specify an action to be taken for rows in the target that were not matched by rows from the source.
WHEN NOT MATCHED BY SOURCE While much less commonly used than the clauses discussed in the previous topics, you can also take an action for rows in the target that did not match any incoming rows from the source. Generally, this will involve deleting the unmatched rows in the target table but UPDATE actions are also permitted. Note the format of the DELETE statement in the example on the slide. At first glance, it might seem quite odd as it has no table or predicate specified. In this example, all rows in the Employee table that were not matched by an incoming source row from the EmployeeUpdate table would be deleted. Question: What would the DELETE statement look like if it only deleted rows where the date in a column called LastModifed were older than a year?
10-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
OUTPUT Clause and $action
Key Points The OUTPUT clause was added in SQL Server 2005 and allows the return of a set of rows when performing data modifications. In 2005, this applied to INSERT, DELETE and UPDATE. In SQL Server 2008 and later, this clause can also be used with the MERGE statement.
OUTPUT Clause The OUTPUT clause was a useful addition to the INSERT, UPDATE and DELETE statements in SQL Server 2005. For example, consider the following code: DELETE FROM HumanResources.Employee OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());
In this example, employees are deleted when their rows have not been modified within the last ten years. As part of this modification, a set of rows is returned that provides details of the BusinessEntityID and NationalIDNumber for each row deleted. As well as returning rows to the client application, the OUTPUT clause can include an INTO sub-clause that causes the rows to be inserted into another existing table instead. Consider the following example: DELETE FROM HumanResources.Employee OUTPUT deleted.BusinessEntityID, deleted.NationalIDNumber INTO Audit.EmployeeDelete WHERE ModifiedDate < DATEADD(YEAR,-10,SYSDATETIME());
In this example, details of the employees being deleted are inserted into the Audit.EmployeeDelete table instead of being returned to the client.
Merging Data and Passing Tables
10-11
OUTPUT and MERGE The OUTPUT clause can also be used with the MERGE statement. When an INSERT is performed, rows can be returned from the inserted virtual table. When a DELETE is performed, rows can be returned from the deleted virtual table. When an UPDATE is performed, values will be available in both the inserted and deleted virtual tables. Because a single MERGE statement can perform INSERT, UPDATE and DELETE actions, it can be useful to know which action was performed for each row returned by the OUTPUT clause. To make this possible, the OUTPUT clause also supports a $action virtual column that returns details of the action performed on each row. It returns the words "INSERT", "UPDATE" or "DELETE".
Composable SQL In SQL Server 2008 and later, it is now possible to consume the rowset returned by the OUTPUT clause more directly. The rowset cannot be used as a general purpose table source but can be used as a table source for an INSERT SELECT statement. Consider the following example: INSERT INTO Audit.EmployeeDelete SELECT Mods.EmployeeID FROM (MERGE INTO dbo.Employee AS e USING dbo.EmployeeUpdate AS eu ON e.EmployeeID = eu.EmployeeID WHEN MATCHED THEN UPDATE SET e.FullName = eu.FullName, e.EmploymentStatus = eu.EmploymentStatus WHEN NOT MATCHED THEN INSERT (EmployeeID,FullName,EmploymentStatus) VALUES (eu.EmployeeID,eu.FullName,eu.EmploymentStatus) OUTPUT $action AS Action,deleted.EmployeeID) AS Mods WHERE Mods.Action = 'DELETE';
In this example, the OUTPUT clause is being used with the MERGE statement. A row would be returned for each row either updated or deleted. However, you wish to only audit the deletion. You can treat the MERGE statement with an OUTPUT clause as a table source for an INSERT SELECT statement. The enclosed statement must be given an alias. In this case, the alias "Mods" has been assigned. The power of being able to SELECT from a MERGE statement is that you can then apply a WHERE clause. In this example, only the DELETE actions have been selected. Note that from SQL Server 2008 onwards, this level of query composability also applies to the OUTPUT clause when used in standard T-SQL INSERT, UPDATE and DELETE statements. Question: How could the OUTPUT clause be useful in a DELETE statement?
10-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
MERGE Determinism and Performance
Key Points The actions performed by a MERGE statement are not identical to those that would be performed by separate INSERT, UPDATE or DELETE statements.
Determinism When an UPDATE statement is executed with a join, if more than one source row matches a target row, no error is thrown. This is not permitted for an UPDATE action performed within a MERGE statement. Each source row must match only a single target row or none at all. If more than a single source row matches a target row, an error occurs and all actions performed by the MERGE statement are rolled back.
Performance of MERGE The MERGE statement will often outperform code constructed from separate INSERT, UPDATE and DELETE statements and conditional logic. In particular, the MERGE statement only ever makes a single pass through the data.
Merging Data and Passing Tables
10-13
Demonstration 1A: MERGE Statement
Key Points In this demonstration you will see: • • • • •
How to use the MERGE statement How to use the OUTPUT clause with the MERGE statement How to perform optional updates with MERGE How to use MERGE as a composable query How to use the VALUES clause as a MERGE source
Demonstration Steps 1. 2.
3. 4. 5.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_10_PRJ\6232B_10_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 11 – Demonstration 1A.sql script file. Follow the instructions contained within the comments of the script file. Question: What is meant by the term "composable query"?
10-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 2
Implementing Table Types
It was mentioned earlier that reducing the number of calls between client applications and SQL Server is important. The aim is to minimize the amount of time lost through network delays and latency. SQL Server 2000 introduced the TABLE data type for use as variables. SQL Server 2008 introduced the ability to declare these as permanent (or temporary) data types and to use them as parameters. The use of TABLE valued parameters can significantly reduce the number of round trips needed between client application code and SQL Server.
Objectives After completing this lesson, you will be able to: • • • •
Explain the need to reduce round-trip overhead Describe previous options for passing lists as parameters Explain the role of the TABLE type Populate TABLE types with row constructors
Merging Data and Passing Tables
10-15
Reducing Round-Trip Overhead
Key Points In many applications, the time taken for commands to be sent to the server and for responses to be received can be substantial. It can often be longer than the time to execute the SQL command at the server. It is desirable then to minimize the number of times this happens in an operation.
Causes of Excessive Round Trips Developers often aim to create code that is able to be reused. One common end result of this is the creation of many small functions and procedures that perform single tasks. Performing a larger task however, can then require calling many subtasks. Consider what is involved in inserting a new customer order into a SQL Server database. You typically need to do the following: • • • • • •
Start a transaction Save the order header Save the first order detail line Save the second order detail line Save the third order detail line Commit the transaction
This means that performing a single action results in six separate round trips to the server.
Transaction Duration A golden rule when designing systems to maximize concurrency is to never hold a transaction open for longer than required. In the previous example, the transaction is being held open for the time to insert the order header and detail lines. While the transaction needs to be held open for this time, its duration is being artificially increased by the time taken to make all the round trips to the server. This is not desirable. Question: How could the number of round trips being made to the server be reduced?
10-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Options for Passing Lists as Parameters
Key Points One method for reducing the number of round trips from a client application to SQL Server is to pass lists of values in each trip.
Previous Options Prior to SQL Server 2008, the available options for passing lists of values within a single procedure call were very limited. The most commonly used method was to use delimited lists. Mostly these were implemented as comma-delimited lists but other delimiters (eg: pipe) were used. When delimited lists were used, the entire set of values was sent as a string. There are issues with doing this: • • •
No control was able to be exerted over the data type being passed. A string could be passed to code expecting a number. The structure was loose. For example, one string might contain five columns and another might contain six. Custom string parsing logic needed to be written.
Passing XML Another option for passing lists of values was to use XML. This became more common with SQL Server 2005 when the XML data type was introduced. Prior to SQL Server 2005, developers would also sometimes pass XML values as strings but they then needed to write very complex parsing logic. With the introduction of the XML data type, the parsing became easier but not trivial. Processing the received XML is also non-trivial and some processing methods (such as OPENXML) had memory implications, while others (such as the nodes() method) had query optimization implications. In modules 17 and 18 you will learn more about the use of XML in SQL Server.
Merging Data and Passing Tables
10-17
Demonstration 2A: Passing Delimited Lists
Key Points In this demonstration you will see: • • • •
How to query a table-valued function that performs list parsing How the function allows for different delimiters How to use the function in a join How common errors can occur with delimited lists
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_10_PRJ\6232B_10_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Follow the instructions contained within the comments of the script file.
Question: What are the basic problems with using delimited lists for parameters?
10-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Introduction to the TABLE Type
Key Points SQL Server 2008 introduced the ability to create user TABLE data types and record them in the system catalog. These types are very useful as they can be used for both variables and for parameters.
TABLE Type In SQL Server 2000, it was possible to declare a variable of type TABLE. You needed to define the schema of the table when declaring the variable such as in the following code: DECLARE @BalanceList TABLE (CustomerID int, CurrentBalance decimal(18,2));
In this example, a variable called @BalanceList is defined as being a table. The schema of the table and the variable, only last for the duration of the batch in which the variable is defined. SQL Server 2008 introduced the ability to create user-defined table data type definitions. You can create table data types that can be used both for the data type of variables but also for the data type of parameters. In the example shown on the slide, CustomerBalance is declared as a new data type. You can declare a variable as being of CustomerBalance data type as follows: CREATE TYPE dbo.CustomerBalance AS TABLE (CustomerID int, CurrentBalance decimal(18,2)); GO DECLARE @BalanceList dbo.CustomerBalance;
Merging Data and Passing Tables
10-19
A key advantage of table data types is that you can pass complex structures inside a table more easily than you could in alternatives such as comma-delimited lists. You can have multiple rows each of two or more columns and you can be sure of the data types that will be stored. This is also useful even when declaring variables as a table type can be created and then used for variables through the database application, which leads to less potential inconsistencies. Note that there is no ALTER TYPE statement that can be used to modify the TABLE type definition. Types must be dropped and then recreated when they need to be altered.
10-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Populating TABLE Types with Row Constructors
Key Points SQL Server 2008 also introduced the concept of row constructors and multi-row INSERT statements. These are useful when working with TABLE data types as well as when working with database tables.
Row Constructors for Populating TABLE Types Versions of SQL Server prior to SQL Server 2008 only permitted inserting a single row of data at a time with an INSERT statement unless an INSERT…SELECT or INSERT...EXEC statement was used. SQL Server 2008 introduced the concept of a row constructor. In the example shown in the slide, a variable named @Balance of type dbo.CustomerBalance has been declared. That data type was the table data type that was defined in the example from the previous topic. Three rows have then been inserted into the table variable. Note that a multi-row INSERT that inserts 3 rows is quite different to 3 separate INSERT statements. It is an atomic operation involving 3 rows in that all inserts occur or none occur. Note also that multi-row INSERT statements cannot include more than 1000 rows per statement and that multi-row INSERTS also help reduce the number of roundtrips from a client application to a server. Question: What would improve the INSERT query shown in the slide example?
Merging Data and Passing Tables
10-21
Demonstration 2B: TABLE Types and Row Constructors
Key Points In this demonstration you will see: 1. 2. 3.
How to work with row constructors How to declare a table data type How to work with variables with user-defined table data types
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_10_PRJ\6232B_10_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 22 – Demonstration 2B.sql script file. Follow the instructions contained within the comments of the script file.
Question: Can other users make use of the table type that you create?
10-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Using TABLE Types As Parameters
In the previous lesson you saw how to declare TABLE data types and how to declare variables of those types. Another potential use for the TABLE data types is in the declaration of parameters, particularly for use with stored procedures but also able to be used with user-defined functions. In this lesson, you will see how to use TABLE input parameters to stored procedures and see how this solves the round trip problems identified in lesson 1. You will also see how to call a stored procedure when passing a table valued parameter.
Objectives After completing this lesson, you will be able to: • •
Describe the use of TABLE input parameters for stored procedures Use row constructors to populate parameters to be passed to stored procedures
Merging Data and Passing Tables
10-23
TABLE Input Parameters for Stored Procedures
Key Points As well as being used for declaring variables, user-defined table data types can be used as parameter data types for stored procedures and functions. (User-defined functions will be discussed in Module 13).
Table Valued Parameters Table-valued parameters can only be used as input parameters. Note that the term READONLY must be specified and that it is the only allowable value. In the example shown in the slide, a data type called SalesDetails is being created to hold the detail lines for a customer sale. A stored procedure is then being created that takes sales header details as standard relational parameters but also takes the entire list of sales details rows as a single parameter. This allows for the creation of a stored procedure that would process an entire sale in a single call.
Reducing Round Trips In an earlier discussion, the requirements for saving a customer order were noted. Using this same technique, you could place the order header in one table and all the order rows in another table and pass both in a single call to a stored procedure that saves the order. That single stored procedure could also begin and commit the transaction so the transaction would not span even a single network round trip. What is more interesting though is that you could pass multiple order headers and their corresponding order detail rows in a single call. This means that the procedure could now save one or more orders with all their detail rows in a single round-trip to the server. Question: What would you have to do to be able to pass multiple sales and their detail lines in a single call?
10-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Using Row Constructors to Populate Parameters
Key Points In the previous topic, you saw how to declare a stored procedure that uses a table-valued parameter. The final step in using such a procedure is to then pass a table parameter in the EXEC statement.
Passing a Table Valued Parameter in an EXEC Statement As well as defining a table-valued parameter for a stored procedure, you also need to construct a table to pass to the stored procedure in the EXEC call. First, you declare a table variable with the same user-defined table data type that was used when declaring the stored procedure. Next, you populate that variable. Row constructors are ideal for populating the table variables that will be passed in an EXEC call. In the example shown in the slide, three product detail rows are being inserted into the @SalesDetails variable. Finally, the stored procedure is called via an EXEC statement and the detail rows are passed as a single parameter. You will see this in detail in the upcoming demonstration.
Merging Data and Passing Tables
10-25
Demonstration 3A: Passing Tables to Stored Procedures
Key Points In this demonstration you will see: 1. 2. 3. 4.
How traditional stored procedure calls often involve multiple round trips to the server How to declare a table data type How to use the table data type to avoid round trips How to view catalog information about the table data types by querying the sys.types and sys.table_types system views
Demonstration Steps 1.
2. 3.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_10_PRJ\6232B_10_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 31 – Demonstration 3A.sql script file. Follow the instructions contained within the comments of the script file.
Question: What is the purpose of the SCOPE_IDENTITY() function shown in the demonstration?
10-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 10: Passing Tables and Merging Data
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3.
4.
5. 6. 7.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V Manager window. In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started: •
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
In Virtual Machine Connection window, click on the Revert toolbar icon. If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete. In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
•
Click Switch User, and then click Other User.
•
Log on using the following credentials:
Merging Data and Passing Tables
10-27
i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd 8. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 9. If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window. 10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_10_PRJ\6232B_10_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario In earlier versions of SQL Server, passing lists of values to stored procedures was a challenge. SQL Server 2008 introduced the table type and table-valued parameters. In this lab, you will create a replacement stored procedure Reports.GetProductsByColorList_Test that uses a table-valued parameter to replace an existing stored procedure Reports.GetProductsByColorList that was based on passing a comma-delimited list of values. If time permits, you will then create a new procedure that processes complete rows of data and performs updates using the MERGE statement.
Supporting Documentation Procedure Required: Marketing.SalespersonMerge Requirements Input Parameters:
Table of Salesperson details, including SalespersonID, FirstName, MiddleName, LastName, BadgeNumber, EmailAlias, SalesTerritoryID. The parameter should be named: SalespersonDetails
Output Parameters:
None
Output Rows:
For each row, return one column called Action that contains INSERT or UPDATE and another column with the SalespersonID.
Notes:
The SalespersonID must be provided. If it matches an existing salesperson, that row should be updated. Only update columns that are provided. Any SalesTerritoryID that is provided must be valid as it is defined as a foreign key to the Marketing.SalesTerritory table.
10-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Create a Table Type Scenario In this exercise, you will create a table type to support the parameter that will later need to be passed to the replacement stored procedure. The main tasks for this exercise are as follows: 1. 2. 3.
Review the parameters of a stored procedure Review the existing function Create a new table type
Task 1: Review the parameters of a stored procedure •
Review the parameters of the existing stored procedure Reports.GetProductsByColorList
Task 2: Review the existing function •
Review the function dbo.StringListToTable used by the existing stored procedure. Note the hardcoded length of 1000 for each component of the returned table entries
Task 3: Create a new table type •
Create a new table type to support this type of input parameter. Call the type StringList Results: After this exercise, you have created a new table type.
Merging Data and Passing Tables
10-29
Exercise 2: Use a Table Type Parameter Scenario In this exercise, you will create a replacement stored procedure Reports.GetProductsByColorList_Test that uses the table type for its parameter. The main tasks for this exercise are as follows: 1. 2.
Create the stored procedure Test the stored procedure
Task 1: Create the stored procedure •
Create a new stored procedure that is functionally equivalent to Reports.GetProductsByColorList except that the new procedure (call it Reports.GetProductsByColorList_Test) takes a single table @ColorList as a parameter
Task 2: Test the new procedure •
Test the new procedure Results: After this exercise, you should have created a new stored procedure that uses the table type for its parameter.
10-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 3: Use a Table Type with MERGE (Only if time permits) Scenario In this exercise, you will create a new stored procedure that takes a table-valued parameter and uses the MERGE statement to update a table in the marketing system. The procedure should allow for the creation or update of salespeople held in the Marketing.Salesperson table. The main tasks for this exercise are as follows: 1. 2. 3.
Create a new table type Create a replacement procedure Test the replacement procedure.
Task 1: Create a new table type •
Create the required table type. (Create it in the dbo schema)
Task 2: Create a replacement stored procedure •
Review the supporting documentation and create a replacement procedure based on the requirements
Task 3: Test the replacement procedure •
Test the replacement procedure Results: After this exercise, you should have created a new stored procedure that takes a table-valued parameter and uses the MERGE statement to update a table.
Merging Data and Passing Tables
10-31
Module Review and Takeaways
Review Questions 1. 2.
What is the difference between SOURCE NOT MATCHED and TARGET NOT MATCHED in a MERGE statement? What is a key advantage of the MERGE statement in terms of performance?
Best Practices 1. 2.
Use multi-row inserts when the rows being inserted are related in some way, for example, the detail rows of an invoice. Consider making multiple-entity procedures instead of single-entity procedures to help minimize round trip behavior and to reduce locking. For example, very minor changes are required to construct a stored procedure that can insert multiple sales orders compared to a stored procedure that can insert a single sales order.
10-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-1
Module 11 Creating Highly Concurrent SQL Server 2008 R2 Applications Contents: Lesson 1: Introduction to Transactions
11-3
Lesson 2: Introduction to Locks
11-17
Lesson 3: Management of Locking
11-28
Lesson 4: Transaction Isolation Levels
11-38
Lab 11: Creating Highly Concurrent SQL Server Applications
11-44
11-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
It is the responsibility of an enterprise database system to provide mechanisms ensuring the physical integrity of each transaction. A transaction is a sequence of operations performed as a single logical unit of work, and Microsoft® SQL Server™ provides locking facilities that preserve transaction isolation. In this module, you will learn how to manage transactions and locks. Database systems struggle with the need to balance consistency and concurrency. There is often a direct trade-off between these two aims. The challenge is to use the lowest concurrency impact possible while still maintaining sufficient consistency. This module explains how to use transactions and the SQL Server locking mechanisms to meet the performance and data integrity requirements of your applications. Another aim of database management systems is to provide each user with the illusion wherever possible that they are the only user on the system. Transaction isolation levels are critical to minimizing the impact of one user on another. In this module you will also investigate transaction isolation levels.
Objectives After completing this module, you will be able to: • • • •
Describe the role of transactions Explain the role of locks Manage locking Work with transaction isolation levels
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-3
Lesson 1
Introduction to Transactions
A core capability provided by relational database management systems such as SQL Server is to provide the ability to group a set of changes that need to be made and to ensure that the entire set of changes occurs or that none occur. Transactions are how this requirement is met within SQL Server.
Objectives After completing this lesson, you will be able to: • • • • • •
Explain what transactions are Describe auto commit transactions Describe explicit transactions Describe implicit transactions Explain the role of transaction recovery Detail considerations for using transactions
11-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
What are Transactions?
Key Points A transaction is a sequence of steps that perform a logical unit of work. They must exhibit four properties that are collectively known as ACID.
Atomicity Atomicity says that either all the steps in the transaction must succeed or none of them should be performed.
Consistency Consistency ensures that when the transaction is complete, data must be in a consistent state. A consistent state is one where the data is consistent with the business rules related to the data. Inconsistent data violates one or more business rules. In SQL Server, the database has to be in a consistent state after each statement within a transaction and not only at the end of the transaction.
Isolation Isolation defines that changes made by a transaction must be isolated from other concurrent transactions.
Durability Durability defines that when the transaction is complete, the changes must be made permanent in the database and must survive even system failures. Transactions ensure that multiple data modifications are processed as a unit. For example, a banking transaction might credit one account and debit another. Both steps must be completed together or not at all. SQL Server supports transaction processing to manage multiple transactions.
Transaction Log Every transaction is recorded in a transaction log to maintain database consistency and to aid in transaction recovery. When changes are made to data in SQL Server, the database pages that have been
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-5
modified are written to the transaction log on the disk first and later written to the database. If any part of the transaction fails, all of the changes made so far are rolled back to leave the database in its original state. This system ensures that updates are complete and recoverable. Transactions use locking to prevent other users from changing or reading data in a transaction that has not completed. Locking is required in online transaction processing (OLTP) for multi-user systems. Question: Can you think of database operations in your organization where database transactions are especially critical?
11-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Auto Commit Transactions
Key Points Autocommit mode is the default transaction management mode of the SQL Server Database Engine. Every Transact-SQL statement is committed or rolled back when it completes. If a statement completes successfully, it is committed; if it encounters any error, it is rolled back.
Autocommit Mode A connection to an instance of the Database Engine operates in autocommit mode whenever this default mode has not been overridden by either explicit or implicit transactions. Autocommit mode is also the default mode for ADO, OLE DB, ODBC, and DB-Library. A connection to an instance of the Database Engine operates in autocommit mode until a BEGIN TRANSACTION statement starts an explicit transaction, or implicit transaction mode is set on. When the explicit transaction is committed or rolled back, or when implicit transaction mode is turned off, the connection returns to autocommit mode. If a run-time statement error (such as a constraint violation) occurs in a batch, the default behavior in the Database Engine is to roll back only the statement that generated the error. You can change this behavior using the SET XACT_ABORT statement. After SET XACT_ABORT ON is executed, any run-time statement error causes the batch to quit and if you are currently in a transaction, that would cause an automatic rollback of the current transaction.
Compile Errors Compile errors, such as syntax errors, are not affected by SET XACT_ABORT. When working in autocommit mode, compile time errors can cause more than one Transact-SQL statement to fail. In this mode, a batch of statements is compiled as a unit and if a compile error is found, nothing in the batch is compiled or executed. The following example shows how this might happen. USE AdventureWorks2008; GO
Creating Highly Concurrent SQL Server 2008 R2 Applications
CREATE GO INSERT INSERT INSERT GO SELECT GO
11-7
TABLE NewTable (Id INT PRIMARY KEY, Info CHAR(3)); INTO NewTable VALUES (1, 'aaa'); INTO NewTable VALUES (2, 'bbb'); INTO NewTable VALUSE (3, 'ccc'); * FROM NewTable;
-- Syntax error.
-- Returns no rows.
Because there is a typographical error in the third INSERT statement, the batch cannot compile, and so none of the three INSERT statements run. XACT_ABORT can convert statement terminating errors into batch terminating errors. It will be covered in more detail in the T-SQL Error Handling module. Question: When might autocommit mode not be appropriate in a database application?
11-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
Explicit Transactions
Key Points An explicit transaction is one in which you explicitly define both the start and end of the transaction. You can use explicit transactions to define your own units of business logic. For example, in a bank transfer function, you might enclose the withdrawal of funds from one account and the deposit of those funds in another account within one logical unit of work. DB-Library applications and Transact-SQL scripts use the BEGIN TRANSACTION, COMMIT TRANSACTION, COMMIT WORK, ROLLBACK TRANSACTION, or ROLLBACK WORK Transact-SQL statements to define explicit transactions.
Starting a Transaction You start a transaction by using the BEGIN TRANSACTION statement. You can specify a name for the transaction, and you can use the WITH MARK option to specify a description for the transaction to be marked in the transaction log. This transaction log mark can be used when restoring a database to indicate the point to which you want to restore. The BEGIN TRANSACTION statement is often abbreviated to BEGIN TRAN. XACT_ABORT has an effect on explicit transaction as well as on implicit transactions. By default, only the statement in error would be rolled back and the batch would continue to run and commit the other statements in the transaction. Therefore error handling must be implemented (which will be discussed in a later module) or XACT_ABORT can be turned on to abort the batch and rollback the whole transaction in case of an error.
Committing a Transaction You can commit the work contained in a transaction by issuing the COMMIT TRANSACTION statement. Use this to end a transaction if no errors have occurred and you want the contents of the transaction to be committed to the database.
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-9
The COMMIT TRANSACTION statement is often abbreviated to just the word COMMIT. To assist users from other database platforms that are migrating to SQL Server, the statement COMMIT WORK can also be used.
Rolling Back a Transaction You can cancel the work contained in a transaction by issuing the ROLLBACK TRANSACTION statement. Use this to end a transaction if errors have occurred and you want the contents of the transaction to be undone and the database to remain in the state it was before the transaction began. The ROLLBACK TRANSACTION statement is often abbreviated to just the word ROLLBACK. To assist users from other database platforms that are migrating to SQL Server, the statement ROLLBACK WORK can also be used.
Saving a Transaction By using savepoints, you can roll back a transaction to a named point in the transaction, instead of the beginning of the transaction. You create a savepoint by issuing the SAVE TRANSACTION statement and specifying the name of the savepoint. You can then use the ROLLBACK TRANSACTION statement and specify the savepoint name to roll the changes back to that point. Use savepoints when an error is unlikely to occur and the cost of checking the data before the error occurs is much higher than testing for the error after the data modifications have been submitted. For example, if you do not expect stock levels to be too low to fulfill an order, you could create a trigger that raises an error when stock levels fall below zero on a stock table. In your ordering code, you can create a savepoint, submit the order, and then check for a negative stock level error from the trigger. If that error is raised, you can roll back the transaction to before the savepoint and notify the customer accordingly. Question: When might you want to use a savepoint?
11-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Implicit Transactions
Key Points When a connection is operating in implicit transaction mode, the database engine automatically starts a new transaction after the current transaction is committed or rolled back. You do nothing to delineate the start of a transaction; you only commit or roll back each transaction. Implicit transaction mode generates a continuous chain of transactions.
Implicit Transactions In most cases, it is best to work in autocommit mode and define transactions explicitly using the BEGIN TRANSACTION statement. However for applications that were originally developed on systems other than SQL Server, the implicit transaction mode can be useful. Implicit transaction mode automatically starts a transaction when you issue certain statements, and the transaction then continues until you issue a commit statement or a rollback statement.
Setting Implicit Transaction Mode You use the SET statement to switch implicit transaction mode on and off, as shown in the following example. SET IMPLICIT_TRANSACTIONS ON; -- Do some work in implicit transaction mode. SET IMPLICIT TRANSACTIONS OFF; -- Return to autocommit mode.
By default, implicit transaction mode is off and the database works in autocommit mode.
Starting Implicit Transactions When using implicit transaction mode, a transaction is automatically started when any of the following statements are executed: •
ALTER TABLE
Creating Highly Concurrent SQL Server 2008 R2 Applications
• • • • • • • • • • •
11-11
CREATE DELETE DROP FETCH GRANT INSERT OPEN REVOKE SELECT TRUNCATE TABLE UPDATE
Nested transactions (where a transaction is started within another transaction) are not allowed in implicit transaction mode. If the connection is already in a transaction, these statements do not start a new transaction.
Ending Implicit Transactions If you do not explicitly end an implicit transaction, none of the changes will be committed to the database when the user disconnects. You must use the COMMIT TRANSACTION statement to make the changes permanent or the ROLLBACK TRANSACTION statement to delete the changes and release any locks that are being held. Question: Can you think of an application in your organization where implicit transactions might be appropriate?
11-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Transaction Recovery
Key Points SQL Server automatically guarantees that all committed transactions are reflected in the database in the event of a failure. It uses the transaction log and checkpoints to do this.
Checkpoints As each Transact-SQL statement is executed, it is recorded to the transaction log on disk before it is written to the database and before the user is notified that the transaction was committed successfully. SQL Server performs checkpoints at defined intervals. Checkpoints are marked in the transaction log to identify which transactions have already been applied to the database. When a new checkpoint occurs, all data pages in memory that have been modified since the last checkpoint are written to the database.
Transaction Recovery If any errors occur during a transaction, the instance of SQL Server uses the information in the log file to roll back the transaction. This rollback does not affect the work of any other users working in the database at the same time. Usually, the error is returned to the application, and if the error indicates a possible problem with the transaction, the application issues a ROLLBACK statement. Some errors, such as a 1205 deadlock error, roll back a transaction automatically. If anything stops the communication between the client and an instance of SQL Server while a transaction is active, the instance rolls back the transaction automatically when notified of the stoppage by the network or operating system. This could happen if the client application terminates, if the client computer is shut down or restarted, or if the client network connection is broken. In all of these error conditions, any outstanding transaction is rolled back to protect the integrity of the database.
System Failures and Restarts The recovery process runs automatically every time that SQL Server starts, such as after an intended shutdown or a power failure. The automatic recovery process uses the transaction log to roll forward any
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-13
committed transactions and roll back any incomplete transactions. The log uses the last checkpoint as a starting marker knowing that all transactions committed before this were written to the database and that all transactions that started before this, but were still active, need to be rolled back as the changes where already written to the data files. In the slide example: • • •
Transaction 1 is committed before the checkpoint, so it is reflected in the database. Transactions 2 and 4 were committed after the checkpoint, so they must be reconstructed from the log (rolled forward). Transactions 3 and 5 were not committed, so SQL Server rolls them back.
Question: A server crash occurs while two transactions are running. Transaction A is an autocommit transaction that has been written to the transaction log, but not written to the disk. Transaction B is an explicit transaction that has not been committed, though a checkpoint was written while Transaction B was running. What will happen to each transaction when the server is recovered?
11-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Considerations for using Transactions
Key Points There are a number of general considerations that need to be kept in mind when working with transactions.
Keep transactions as short as possible Transactions should be as short as possible. Longer transactions increase the likelihood that users will not be able to access locked data. Some methods to keep transactions short include the following: •
• •
• •
Do not require input from users during a transaction. Address issues that require user interaction before you start the transaction. For example, if you are updating a customer record, obtain the necessary information from the user before you begin the transaction. A golden rule here is to never hold a transaction across a user interaction. Do not open a transaction while browsing through data, if at all possible. Transactions should not start until all preliminary data analysis has been completed. INSERT, UPDATE, and DELETE should be the primary statements in a transaction, and they should be written to affect the fewest number of rows. A transaction should never be smaller than a logical unit of work. Access the least amount of data possible while in a transaction. This decreases the number of locked rows and reduces contention. Ensure appropriate indexing is in place as this reduces the number of pages that need to be accessed and locked.
Try to access resources in the same order Accessing resources in the same order within transactions tends to naturally serialize your access to the database and can help to avoid deadlocks. However, doing this is not always possible. Note that deadlocks will be discussed later in the module.
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-15
Considerations for nested transactions Consider the following issues regarding nested transactions: • • •
You should use nesting carefully, if at all, because the failure to commit or roll back a transaction leaves locks in place indefinitely. You can use the @@trancount global variable to determine whether any open transactions exist and how deeply they are nested While SQL Server's syntax appears to support transaction nesting, a commonly misunderstood aspect of SQL Server transaction handling is that a ROLLBACK operation that occurs in a nested transaction will roll back all current transactions, not just the current level.
Question: When would nested transactions be appropriate?
11-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Transactions
Key Points In this demonstration you will see: • •
How transactions work How blocking affects other users Note that blocking is discussed further in the next lesson
Demonstration Steps 1.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
2.
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_11_PRJ\6232B_11_PRJ.ssmssln and click Open.
3.
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
4.
Open the 11 – Demonstration 1A.sql script file.
5.
Open the 12 – Demonstration 1A 2nd Window.sql script file.
6.
Follow the instructions contained within the comments of the script file.
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-17
Lesson 2
Introduction to Locks
SQL Server (and many other database engines) makes extensive use of locks when ensuring isolation between users and consistency of transactions. It is important to have an understanding of how locking works and to understand how locking differs from blocking.
Objectives After completing this lesson, you will be able to: • • • • • • •
Detail different methods of Concurrency Control Explain what locks are Differentiate between blocking vs locking Describe what concurrency problems are prevented by locking Detail SQL Server's lockable resources Describe the types of locks that are available Explain lock compatibility
11-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Methods of Concurrency Control
Key Points Concurrency control is system of controls that administers many people trying to access the same data at the same time so they do not adversely affect each other.
Concurrency Control SQL Server supports a wide range of optimistic and pessimistic concurrency control mechanisms. Users specify the type of concurrency control by specifying the transaction isolation level for a connection. Concurrency control ensures that modifications made by one person do not adversely affect modifications that others make. There are two types of concurrency control; pessimistic and optimistic.
Pessimistic Concurrency Control Pessimistic concurrency control locks data when data is read in preparation for an update. Other users cannot then perform actions that would alter the underlying data until the user who applied the lock is done with the data.
Optimistic Concurrency Control Optimistic concurrency control does not hold locks on data after the data is initially read. Rather, when an update is performed, SQL Server checks to determine whether the underlying data was changed since it initially read it. If so, the user receives an error, the transaction rolls back, and the user must start over. Optimistic concurrency control works well where a low level of contention for data exists. Question: Can you think of an application in your organization that might work well with optimistic concurrency control?
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-19
What are Locks?
Key Points Locking is a mechanism used by the Database Engine to synchronize access by multiple users to the same piece of data at the same time.
Locking Behavior Before a transaction acquires a dependency on the current state of a piece of data, such as by reading or modifying the data, it must protect itself from the effects of another transaction modifying the same data. The transaction does this by requesting a lock on the piece of data. Read locks allow others to read but not write data. Write locks stop others from reading or writing. In SQL Server, these locks are implemented via different locking modes, such as shared or exclusive. The lock mode defines the level of dependency the transaction has on the data. No transaction can be granted a lock that would conflict with the mode of a lock already granted on that data to another transaction. If a transaction requests a lock mode that conflicts with a lock that has already been granted on the same data, the instance of the Database Engine will pause the requesting transaction until the first lock is released. When a transaction modifies a piece of data, it holds the lock protecting the modification until the end of the transaction. How long a transaction holds the locks acquired to protect read operations depends on the transaction isolation level setting. All locks held by a transaction are released when the transaction completes (either commits or rolls back). Question: If a doctor's office uses a database application to manage patient records, how might locks play a role in that application?
11-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Blocking vs. Locking
Key Points The two terms locking and blocking are not the same thing and are often confused for each other.
Locking Locking is the action of taking and holding locks that is used to implement concurrency control.
Blocking Blocking is what happens to one process while it needs to wait for a resource that another process has locked. Blocking is a normal occurrence for systems that use locking. It is only excessive blocking that is a problem. Question: What symptoms do you imagine that "excessive" blocking might relate to?
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-21
What Concurrency Problems are Prevented by Locking?
Key Points Users modifying data can affect other users who are reading or modifying the same data at the same time. These users are said to be accessing the data concurrently. If a data storage system has no concurrency control, users could see the side effects listed on the slide.
Common Concurrency-related Problems Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of the other transactions. The last update overwrites updates made by the other transactions, which results in lost data. Uncommitted dependency (or dirty read) occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row. Inconsistent analysis is also known as non-repeatable reads. It occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read as a result of a deletion by a different transaction. Similarly, the transaction's second or succeeding read shows a row that did not exist in the original read as the result of an insertion by a different transaction. If another user changes the index key column of the row during your read, the row might appear again if the key change moved the row to a position ahead of your scan. Similarly, the row might not appear if the key change moved the row to a position in the index that you had already read.
11-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Question: Has your organization experienced concurrency problems with database applications? If so, what behavior did you see?
Creating Highly Concurrent SQL Server 2008 R2 Applications
Lockable Resources
Key Points For optimal performance, the number of locks that SQL Server maintains must be balanced with the amount of data that each lock holds. To minimize the cost of locking, SQL Server automatically locks resources at a level that is appropriate to the task. Question: If a database needs to lock several rows of data at once, what resources might be locked?
11-23
11-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
Types of Locks
Key Points SQL Server locks resources using different lock modes that determine how the resources can be accessed by concurrent transactions. SQL Server has two main types of locks: basic locks and locks for special situations.
Basic Locks In general, read operations acquire shared locks, and write operations acquire exclusive locks. •
Shared locks. SQL Server typically uses shared (read) locks for operations that neither change nor update data. If SQL Server has applied a shared lock to a resource, a second transaction also can acquire a shared lock, even though the first transaction has not completed. Consider the following facts about shared locks:
•
•
They are used for read-only operations; data cannot be modified.
•
SQL Server releases shared locks on a record as soon as the next record is read.
•
A shared lock will exist until all rows that satisfy the query have been returned to the client.
Exclusive locks. SQL Server uses exclusive (write) locks for the INSERT, UPDATE, and DELETE data modification statements. Consider the following facts about exclusive locks: •
Only one transaction can acquire an exclusive lock on a resource.
•
A transaction cannot acquire a shared lock on a resource that has an exclusive lock.
•
A transaction cannot acquire an exclusive lock on a resource until all shared locks are released.
Special Situation Locks Depending on the situation, SQL Server can use other types of locks:
Creating Highly Concurrent SQL Server 2008 R2 Applications
•
11-25
Intent locks. SQL Server uses intent locks internally to minimize locking conflicts. Intent locks establish a locking hierarchy so that other transactions cannot acquire locks at more inclusive levels. For example, if a transaction has an exclusive row-level lock on a specific customer record, the intent lock prevents another transaction from acquiring an exclusive lock at the table-level. Intent locks include intent share (IS), intent exclusive (IX), and shared with intent exclusive (SIX).
•
Update locks. SQL Server uses update locks when it will modify a page at a later point. Before it modifies the page, SQL Server promotes the update page lock to an exclusive page lock to prevent locking conflicts. Consider the following facts about update locks. Update locks are:
•
•
Acquired during the initial portion of an update operation when the pages are first being read.
•
Compatible with shared locks.
Schema locks. SQL Server uses these to ensure that a table or index is not dropped, or its schema modified, when it is referenced by another session. SQL Server provides two types of schema locks:
•
•
Schema stability (Sch-S), which ensures that a resource is not dropped.
•
Schema modification (Sch-M), which ensures that other sessions do not reference a resource that is under modification.
Bulk update locks. SQL Server uses these to enable processes to bulk copy data concurrently into the same table while preventing other processes that are not bulk-copying data from accessing the table. SQL Server uses bulk update locks when either of the following is used: the TABLOCK hint or the table lock on bulk load option.
Question: What happens if a query tries to read data from a row that is currently locked by an exclusive (X) lock?
11-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lock Compatibility
Key Points Some locks are compatible with other locks, and some locks are not. For example, two users can both hold shared locks on the same data at the same time, but only one update lock can be issued on a piece of data at any one time.
Lock Compatibility Locks have a compatibility matrix that shows which locks are compatible with other locks that are established on the same resource. The locks shown in the slide are the most common forms. The locks in the following table are listed in order from the least restrictive (shared) to the most restrictive (exclusive). Existing granted lock Requested Lock
IS
S
U
IX
SIX
X
Intent shared (IS)
Yes
Yes
Yes
Yes
Yes
No
Shared (S)
Yes
Yes
Yes
No
No
No
Update (U)
Yes
Yes
No
No
No
No
Intent exclusive (IX)
Yes
No
No
Yes
No
No
Shared with intent exclusive (SIX)
Yes
No
No
No
No
No
Exclusive (X)
No
No
No
No
No
No
In addition, compatibility for schema locks is as follows:
Creating Highly Concurrent SQL Server 2008 R2 Applications
•
The schema modification lock (Sch-M) is incompatible with all locks.
•
The schema stability lock (Sch-S) is compatible with all locks except the schema modification lock (Sch-M).
Question: Can you think of situations where lock compatibility is important?
11-27
11-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 3
Management of Locking
Locking behavior in SQL Server mostly operates without any need for management or application intervention. However, it may be desirable in some situations to exert control over locking behavior.
Objectives After completing this lesson, you will be able to: • • • • •
Explain locking timeout Describe lock escalation Explain what deadlocks are Describe locking-related table hints Describe methods to view locking information
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-29
Locking Timeout
Key Points Applications might need to wait some time for locks held by other applications to be released. A key decision is how long an application should wait for a lock to be released.
Locking Timeout The length of time that it is reasonable to wait for a lock to be released is totally dependent upon the design requirements of the application. By default, SQL Server will wait forever for a lock. LOCK_TIMEOUT is a session-level setting that determines (in milliseconds) the number of seconds to wait for any lock to be released before rolling back the statement (note not necessarily the transaction) and returning an error. The default value of -1 indicates that SQL Server should wait forever. Setting a lock timeout at the session level is not common as most applications implement query timeouts. READPAST tells SQL Server to ignore any rows that it can't read as they are locked. It is rarely used. Question: Can you think of any situations where READPAST might be useful?
11-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lock Escalation
Key Points Lock escalation is the process of converting many fine-grain locks into fewer coarse-grain locks, reducing system overhead while increasing the probability of concurrency contention.
SQL Server Lock Escalation As the SQL Server Database Engine acquires low-level locks, it also places intent locks on the objects that contain the lower-level objects. When locking rows or index key ranges, the database engine places an intent lock on the pages that contain the rows or keys. When locking pages, the database engine places an intent lock on the higher level objects that contain the pages. In addition to intent lock on the object, intent page locks are requested on the following objects: •
Leaf-level pages of nonclustered indexes
•
Data pages of clustered indexes
•
Heap data pages
The database engine might do both row and page locking for the same statement to minimize the number of locks and reduce the likelihood that lock escalation will be necessary. For example, the database engine could place page locks on a nonclustered index (if enough contiguous keys in the index node are selected to satisfy the query) and row locks on the data. To escalate locks, the database engine attempts to change the intent lock on the table to the corresponding full lock, for example, changing an intent exclusive (IX) lock to an exclusive (X) lock, or an intent shared (IS) lock to a shared (S) lock). If the lock escalation attempt succeeds and the full table lock is acquired, then all heap or B-tree, page (PAGE), or row-level (RID) locks held by the transaction on the
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-31
heap or index are released. If the full lock cannot be acquired, no lock escalation happens at that time and the database engine will continue to acquire row, key, or page locks. Partitioned tables can have locks escalated to the partition level before escalating to the table level. Partition-level escalation can be set on a per-table basis. This allows for data in other partitioned to be available, so that the entire table is not locked. Partition-level lock escalation can be set on a per-table basis. Question: Why do you imagine that SQL Server might find escalating locks worthwhile?
11-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
What are Deadlocks?
Key Points Deadlocks occur when demands for resources are not able to be resolved by waiting for locks to be released, no matter how long the processes involved wait.
Deadlocks The most simple example of a deadlock occurs when two transactions have locks on separate objects and each transaction requests a lock on the other transaction’s object. For example: • • • •
Transaction A holds a shared lock on row 1. Transaction B holds a shared lock on row 2. Transaction A requests an exclusive lock on row 2, but it cannot be granted until Transaction B releases the share lock. Transaction B requests an exclusive lock on row 1, but it cannot be granted until Transaction A releases the share lock.
Each transaction must wait for the other to release the lock. A deadlock can also occur when several long-running transactions execute concurrently in the same database. A deadlock also can occur as a result of the order in which the optimizer processes a complex query, such as a join, in which you cannot necessarily control the order of processing.
How SQL Server Ends a Deadlock SQL Server ends a deadlock by automatically terminating one of the transactions. The process SQL Server uses is in the following list. •
Rolls back the transaction of the deadlock victim
In a deadlock, SQL Server gives priority to the transaction that has been processing the longest; that transaction prevails. SQL Server rolls back the transaction with the least amount of time invested.
Creating Highly Concurrent SQL Server 2008 R2 Applications
•
Notifies the deadlock victim’s application (with message number 1205)
•
Cancels the deadlock victim’s current request
•
Allows other transactions to continue
11-33
Error 1205 is one of the errors that should be specifically checked for by applications. If error 1205 is found, the application should attempt the transaction again. Error 1205 is also a good example of why database engine errors should not be passed directly to end users. The message returned for error 1205 is highly emotive sounding. Emotive words can cause emotive reactions. People don't like to see they have been "chosen as a deadlock victim". Question: Have you experienced deadlocking problems in your current environment? If so, how did you determine that deadlocks were a problem, and how was it resolved?
11-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Locking-related Table Hints
Key Points The available locking hints are listed here for completeness. While a wide range of locking hints are available, it is important to realize that they should rarely be used and only with extreme caution. Question: Why would you ever take an exclusive table-lock?
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-35
Methods to View Locking Information
Key Points Typically, you use SQL Server Management Studio to display a report of active locks. You can use SQL Server Profiler to obtain information on a specific set of transactions. You can also use Reliability and Performance Monitor to display SQL Server locking histories.
Dynamic Management Views Many DMVs are available for viewing locking information. Amongst the more useful are: Dynamic Management View sys.dm_tran_locks sys.dm_tran_active_transactions sys.dm_tran_session_transactions sys.dm_tran_current_transaction You can query the sys.dm_tran_locks dynamic management view to retrieve information about the locks currently held by an instance of the Database Engine. Each row returned describes a currently granted lock or a requested lock. The columns returned are divided into two main groups; the resource group which describes the resource on which the request is made, and the request group which describes the lock request.
SQL Server Profiler SQL Server Profiler is a tool that monitors server activities. You can collect information about a variety of events by creating traces, which provide a detailed profile of server events. You can use this profile to
11-36
Implementing a Microsoft® SQL Server® 2008 R2 Database
analyze and resolve server resource issues, monitor login attempts and connections, and correct deadlock problems. Use the Locks Event Category to capture locking information in a trace.
Reliability and Performance Monitor You can view SQL Server locking information by using Reliability and Performance Monitor in Windows. Use the SQLServer:Locks objects to retrieve this information. Question: When would you want to choose one method of viewing locks over another?
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-37
Demonstration 3A: Viewing Locking Information
Key Points In this demonstration you will see how: • •
View lock information using Activity Monitor Use dynamic management views to view lock info
Demonstration Steps 1.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_11_PRJ\6232B_11_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
2.
Open the 31 – Demonstration 3A.sql script file.
3.
Open the 32 – Demonstration 3A 2nd Window.sql script file.
4.
Open the 33 – Demonstration 3A 3rd Window.sql script file.
5.
Follow the instructions contained within the comments of the script file.
11-38
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lesson 4
Transaction Isolation Levels
The final important concept that needs to be understood when working with concurrency in SQL Server is transaction isolation levels. It was mentioned earlier that a role of any database engine is to give each user the best possible illusion that they are the only user on the system. This is not totally possible but appropriate use of transaction isolation levels can help in this regard.
Objectives After completing this lesson, you will be able to: • • •
Describe SQL Server transaction isolation levels Explain the role of the read committed snapshot database option Detail isolation-related table hints
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-39
SQL Server Transaction Isolation Levels
Key Points An isolation level protects a transaction from the effects of other concurrent transactions. Use the transaction isolation level to set the isolation level for all transactions during a session. When you set the isolation level, you specify the default locking behavior for all statements in your session.
Transaction Isolation Levels Setting transaction isolation levels allows programmers to accept increased risk of integrity problems in exchange for greater concurrent access to data. The higher the isolation level, the lower the risk of data integrity problems. This is at the cost of locks being held for longer, and the locks themselves being more restrictive with respect to concurrent transactions. You can override a session-level isolation level in individual statements by using lock specification. You also can use locking hints to specify transaction isolation for a statement. You can set the transaction isolation level for a session by using the SET statement. SET TRANSACTION ISOLATION LEVEL {READ COMMITTED | READ UNCOMMITTED | REPEATABLE READ | SERIALIZABLE | SNAPSHOT}
READ UNCOMMITTED is the lowest isolation level, and only ensures that corrupt data is not read. This is equivalent to a NOLOCK table hint. With READ UNCOMMITTED, you are sacrificing consistency in favor of high concurrency. NOLOCK was commonly used in reporting applications before SNAPSHOT isolation level became available. It is not safe to use NOLOCK on queries that are accessing data currently being changed. The ability to read uncommitted data can lead to many inconsistencies. READ COMMITTED acquires short lived share locks before reading data and released after the processing is complete. This is the SQL Server default. Dirty reads cannot occur but non-repeatable reads can occur. REPEATABLE READ retains locks on every row it touches until the end of the transaction. Even rows that do not qualify for the query result remain locked. These locks ensure that the rows touched by the query
11-40
Implementing a Microsoft® SQL Server® 2008 R2 Database
cannot be updated or deleted by a concurrent session until the current transaction completes (whether it is committed or rolled back). SNAPSHOT tries to avoid one process blocking another when only one is performing updates. As an example, if a transaction attempts to modify a row that another transaction is holding a shared lock on, SQL Server creates a copy of the row in a row version table (actually held in tempdb) and allows the update to proceed. The transaction with the shared lock will read from the row in the row version table instead of the table row. The situation where this can lead to a problem is where the application then attempts to update the version of the row in the row version table. If the modification to the original table row is committed, this second modification will then fail with a concurrency violation. Note that before SNAPSHOT isolation level can be used, it needs to be enabled via a database option. SERIALIZABLE ensures consistency by assuming that two transactions might try to update the same data and uses locks to ensure that they do not but at a cost of reduced concurrency - one transaction must wait for the other to complete and two transactions can deadlock. Transactions are completely isolated from one another.
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-41
Read Committed Snapshot
Key Points Not every application can use snapshot isolation level as it needs to be specified when beginning a transaction. Often this requires a change to the application code. Many existing reporting applications cause excessive blocking. Prior to SQL Server 2005, this was commonly dealt with via NOLOCK hints.
Read Committed Snapshot Read Committed Snapshot is a database option that, when enabled, causes DML statements to start generating row versions, even when no transaction is using snapshot isolation. Transactions that specify read committed are automatically altered to use row versioning rather than locking. All statements see a snapshot of data as it existed at the start of the statement. The behavior of the READ COMMITTED option depends on the setting of the READ_COMMITTED_SNAPSHOT database option, which can be ON or OFF. The following table describes the locking isolation level options. Option
Description
READ COMMITTED with READ_COMMITTED_SNAPSHOT OFF
Directs SQL Server to use shared locks while reading. At this level, you cannot experience dirty reads.
READ COMMITTED with READ_COMMITTED_SNAPSHOT ON
Directs SQL Server to use row versioning instead of locking. The data is not protected from updates made by other transactions but the transaction will not be blocked while reading the data.
It is important to note though that the read committed snapshot database option does not achieve exactly the same outcome as the snapshot isolation level set at the session level. With the read committed
11-42
Implementing a Microsoft® SQL Server® 2008 R2 Database
snapshot database option, snapshots are only present for the duration of each statement, not for the duration of the transaction.
Database Configuration Before either snapshot isolation level or the read committed snapshot database option can be used, the database needs to be configured to allow snapshot isolation.
Creating Highly Concurrent SQL Server 2008 R2 Applications
Isolation-related Table Hints
Key Points Similar to the way table hints can be applied to control locking, the isolation-level table hints can be applied to override the default transaction isolation level. They are listed here for completeness.
Isolation-related Table Hints Similar to the locking hints, these should in general be avoided. However, they do provide a way to override the default behavior on an individual table basis within a statement. This could be useful in specialized situations.
11-43
11-44
Implementing a Microsoft® SQL Server® 2008 R2 Database
Lab 11: Creating Highly Concurrent SQL Server Applications
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3.
4.
5. 6. 7.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V Manager window. In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started: •
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
In Virtual Machine Connection window, click on the Revert toolbar icon. If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete. In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
•
Click Switch User, and then click Other User.
Creating Highly Concurrent SQL Server 2008 R2 Applications
•
11-45
Log on using the following credentials:
i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd 8. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 9. If the Server Manager window appears, check the Do not show me this console at logon check box and close the Server Manager window. 10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_11_PRJ\6232B_11_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario In this lab, you will perform basic investigation of a deadlock situation. You are trying to determine an appropriate transaction isolation level for a new application. If you have time, you will investigate the trade-off between concurrency and consistency.
11-46
Implementing a Microsoft® SQL Server® 2008 R2 Database
Exercise 1: Detecting Deadlocks Scenario In this exercise, you will explore typical causes of deadlocks and learn to view them in SQL Server Profiler traces. The main tasks for this exercise are as follows: 1. 2. 3.
Start and configure SQL Server Profiler. Load and execute the test scripts. Stop the trace and review the deadlock graph.
Task 1: Start and configure SQL Server Profiler •
Start SQL Server Profiler and create a new trace called Deadlock Detection
•
Add Deadlock Graph to the events
•
Remove all other events
•
Start the trace in Profiler
Task 2: Load and execute the test scripts •
Open the 51 – Lab Exercise 1.sql script
•
Review the script
•
Open the 52 – Lab Exercise 1 2nd Window.sql script
•
Review the script
•
Execute 51 – Lab Exercise 1.sql and then immediately execute 52 – Lab Exercise 1 2nd Window.sql. Wait for both to complete
Task 3: Stop the trace and review the deadlock graph •
Stop the trace
•
Review the deadlock graph Results: After this exercise, you have executed queries that create a deadlock situation. You will observe how this can be traced in SQL Server Profiler
Creating Highly Concurrent SQL Server 2008 R2 Applications
11-47
Challenge Exercise 2: Investigating Transaction Isolation Levels (Only if time permits) Scenario In this exercise, you will execute a supplied set of T-SQL scripts that demonstrate how different transaction isolation levels work. The main tasks for this exercise are as follows: 1. 2.
Load the scripts Execute the code
Task 1: Load the scripts •
Open the 62 – Lab Exercise 2 2nd Window.sql script
•
Open the 61 – Lab Exercise 2.sql script
Task 2: Execute the code •
Execute the code step by step, making sure to highlight and execute just the required code blocks in each script window, by following the step by step instructions Results: After this exercise, you will have seen how transaction isolation levels work.
11-48
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Review and Takeaways
Review Questions 1. 2. 3.
Why is snapshot isolation level helpful? What is the difference between a shared lock and an exclusive lock? Why would you use read committed snapshot rather than snapshot isolation level?
Best Practices 1. 2. 3.
Always use the lowest transaction isolation level possible to avoid blocking and to avoid the chance of deadlocks. Many Microsoft-supplied components default to Serializable transactional isolation level but do not need to be run at that level. Common examples are Component Services and BizTalk adapters. Before spending too much time investigating blocking issues, make sure that all the queries that are involved are executing quickly. This usually involves making sure that appropriate indexes are in place. Often when query performance issues are resolved, blocking issues disappear.
Handling Errors in T-SQL Code
Module 12 Handling Errors in T-SQL Code Contents: Lesson 1: Understanding T-SQL Error Handling
12-3
Lesson 2: Implementing T-SQL Error Handling
12-13
Lesson 3: Implementing Structured Exception Handling
12-23
Lab 12: Handling Errors in T-SQL Code
12-31
12-1
12-2
Implementing a Microsoft® SQL Server® 2008 R2 Database
Module Overview
When creating applications for SQL Server using the T-SQL language, appropriate handling of errors is critically important. A large number of myths surround how error handling in T-SQL works. In this module, you will explore T-SQL error handling; look at how it has traditionally been implemented and how structured exception handling can be used.
Objectives After completing this lesson, you will be able to: • • •
Design T-SQL error handling Implement T-SQL error handling Implement structured exception handling
Handling Errors in T-SQL Code
Lesson 1
Understanding T-SQL Error Handling
Before delving into the coding that deals with error handling in T-SQL, it is important to gain an understanding of the nature of errors, of where they can occur when T-SQL is being executed, the data that is returned by errors and the severities that errors can exhibit.
Objectives After completing this lesson, you will be able to: • • • •
Explain where T-SQL errors occur Describe types of errors Explain what values are returned by an error Describe different levels of error severities
12-3
12-4
Implementing a Microsoft® SQL Server® 2008 R2 Database
Where T-SQL Errors Occur
Key Points T-SQL statements go through multiple phases during their execution. Errors can occur at each phase. Some errors could potentially be handled by the database engine. Other errors will need to be passed back to the calling application.
Syntax Check In the first phase of execution, the syntax of a statement is checked. At this phase errors occur if the statements do not conform to the rules of the language. During the syntax checking phase that the objects referred to may not actually exist yet still no errors would be returned. For example, imagine the execution of a statement where the word "Customer" was misspelled: SELECT * FROM Custommer;
There is nothing incorrect about this from a syntax point of view. The rules of the T-SQL language have been followed. During the syntax checking phase, no error would be returned.
Object Resolution In the second phase of execution, the objects referenced by name in the T-SQL statements are resolved to underlying object IDs. Errors occur at this phase if the objects do not exist. Single part names are resolved to specific objects at this point. To avoid ambiguity at this point, multi-part names should be used for objects except in rare circumstances. In the example above, SQL Server would first look for an table named "Custommer" in the default schema of the user executing the code. If no such table exists, SQL Server would then look for the table "dbo.Custommer" ie: it would next look in the dbo schema. If no such table existed in the dbo schema, an error would then be returned.
Handling Errors in T-SQL Code
12-5
Statement Execution In the third phase of execution, the statement is executed. At this phase, runtime errors can occur. For example, the user may not have permission to SELECT from the table specified or an INSERT statement might fail because a constraint was going to be violated. You could also have more basic errors occurring at this point such as an attempt to divide by a zero value. Some errors can be handled in the database engine but other errors will need to be handled by client applications. Client applications always need to be written with error handling in mind. Question: Can you suggest a reason why you might want to catch errors in a client application rather than allowing the errors to be seen by the end users?
12-6
Implementing a Microsoft® SQL Server® 2008 R2 Database
Types of Errors
Key Points A number of different categories of error can occur. Mostly they differ by the scope of the termination that occurs when the error is not handled.
Syntax Errors Syntax errors occur when the rules of the language are not followed. For example, consider the following statement: SELECT TOP(10) FROM Production.Product;
If you try to execute the statement, you receive the following message: Msg 156, Level 15, State 1, Line 1 Incorrect syntax near the keyword 'FROM'.
Note that the syntax of the entire batch of statements being executed is checked before the execution of any statement within the batch is attempted. Syntax errors are batch terminating errors.
Object Resolution Errors In the last topic, you saw an example of an object resolution error. These errors occur when the object name specified cannot be resolved to an object ID, such as in the following statement: SELECT * FROM SomeNonexistentTable;
The same issue would occur if the schema for the object was not specified and the object did not exist in the user's default schema or in the dbo schema. Note that if a syntax error occurs, no attempt at object name resolution will be made.
Handling Errors in T-SQL Code
12-7
Statement Terminating Errors With a statement terminating error, execution resumes at the next statement following the statement that was in error. Consider the following batch executed against the AdventureWorks2008R2 database: DELETE FROM Production.Product WHERE ProductID = 1; PRINT 'Hello';
When this batch is executed, the following is returned: Msg 547, Level 16, State 0, Line 3 The DELETE statement conflicted with the REFERENCE constraint "FK_BillOfMaterials_Product_ComponentID". The conflict occurred in database "AdventureWorks2008R2", table "Production.BillOfMaterials", column 'ComponentID'. The statement has been terminated. Hello
Note that the PRINT statement was still executed even though the DELETE statement failed because of a constraint violation.
Batch, Scope, Session and Server Terminating Errors More serious errors can cause the batch, the scope (eg: the current stored procedure) or the session to be terminated. The most serious errors would terminate SQL Server itself. Errors of this nature usually indicate particularly serious hardware errors. Fortunately such errors are rare!
12-8
Implementing a Microsoft® SQL Server® 2008 R2 Database
What's in an Error?
Key Points An error is itself an object and has properties as shown in the table.
What's in an Error It might not be immediately obvious that a SQL Server error (or sometimes called an exception) is itself an object. Errors return a number of useful properties. Error numbers are helpful when trying to locate information about the specific error, particularly when searching online for information about the error. You can view the list of system-supplied error messages by querying the sys.messages catalog view: SELECT * FROM sys.messages ORDER BY message_id, language_id;
When executed, this command returns the following:
Note that there are multiple messages with the same message_id. Error messages are localizable and can be returned in a number of languages. A language_id of 1033 is the English version of the message. You can see an English message in the third line of the output above. Severity indicates how serious the error is. It is described further in the next topic.
Handling Errors in T-SQL Code
12-9
State is defined by the author of the code that raised the error. For example, if you were writing a stored procedure that could raise an error for a missing customer and there were five places in the code that this message could occur, you could assign a different state to each of the places where the message was raised. This would help later to troubleshoot the error. Procedure name is the name of the stored procedure that that error occurred in and Line Number is the location within that procedure. In practice, line numbers are not very helpful and not always applicable. Question: Why is it useful to be able to localize error messages?
12-10
Implementing a Microsoft® SQL Server® 2008 R2 Database
Error Severity
Key Points The severity of an error indicates the type of problem encountered by SQL Server. Low severity values are informational messages and do not indicate true errors. Error severities occur in ranges:
Values from 0 to 10 Values from 0 to 9 are purely informational messages. When queries that raise these are executed in SQL Server Management Studio, the information is returned but no error status information is provided. For example, consider the following code executed against the AdventureWorks2008R2 database: SELECT COUNT(Color) FROM Production.Product;
When executed, it returns a count as expected. However, if you look on the Messages tab in SQL Server Management Studio, you will see the following: Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
Note that no error really occurred but SQL Server is warning you that it ignored NULL values when counting the rows. Note that no status information is returned. Severity 10 is the top of the informational messages.
Values from 11 to 16 Values from 11 to 16 are considered errors that the user can correct. Typically they are used for errors where SQL Server assumes that the statement being executed was in error. Here are a few examples of these errors:
Handling Errors in T-SQL Code
Error Severity Example 11 indicates that an object does not exist 13 indicates a transaction deadlock 14 indicates errors such as permission denied 15 indicates syntax errors 17 indicates that SQL Server has run out of resources (memory, disk space, locks, etc.)
Values from 17 to 19 Values from 17 to 19 are considered serious software errors that the user cannot correct.
Values above 19 Values above 19 tend to be very serious errors that normally involve errors with either the hardware or SQL Server itself. It is common to ensure that all errors above 19 are logged and alerts generated on them.
12-11
12-12
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 1A: Error Types and Severity
Key Points In this demonstration you will see how to: • • •
See how different types of errors are returned from T-SQL statements See the types of messages that are related to severe errors Query the sys.messages view and note which errors are logged automatically
Demonstration Setup 1. 2.
3. 4. 5.
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_12_PRJ\6232B_12_PRJ.ssmssln and click Open. Open and execute the 00 – Setup.sql script file from within Solution Explorer. Open the 11 – Demonstration 1A.sql script file. Follow the instructions contained within the comments of the script file.
Question: What do you imagine the "is_event_logged" column relates to?
Handling Errors in T-SQL Code
12-13
Lesson 2
Implementing T-SQL Error Handling
Now that you understand the nature of errors, it is time to consider how they can be handled or reported in T-SQL. The T-SQL language offers a variety of error handling capabilities. It is important to understand these and how they relate to transactions. This lesson covers basic T-SQL error handling, including how you can raise errors intentionally and how you can set up alerts to fire when errors occur. In the next lesson, you will see how to implement a more advanced form of error handling known as structured exception handling.
Objectives After completing this lesson, you will be able to: • • • • • •
Raise errors Use the @@ERROR system variable Explain the role of errors and transactions Explain transaction nesting errors Raise custom errors Create alerts that fire when errors occur
12-14
Implementing a Microsoft® SQL Server® 2008 R2 Database
Raising Errors
Key Points Both PRINT and RAISERROR can be used to return information or warning messages to applications. RAISERROR allows applications to raise an error that could then be caught by the calling process.
RAISERROR The ability to raise errors in T-SQL makes error handling in the application easier as it is sent like any other system error. RAISERROR is used to: • • •
Help troubleshoot T-SQL code Check the values of data Return messages that contain variable text
Note that using a PRINT statement is similar to raising an error of severity 10, as shown in the sample on the slide.
Substitution Placeholders and Message Number Note that in the message shown in the example on the slide, that %d is a placeholder for a number and %s is a placeholder for a string. Note also that a message number was not mentioned. When errors with message strings are raised using this syntax, the errors raised always have error number 50000. Question: Why might you want to intentionally raise an error in your code?
Handling Errors in T-SQL Code
12-15
Using @@Error
Key Points Most traditional error handling code in SQL Server applications has been created using @@ERROR. Note that structured exception handling was introduced in SQL Server 2005 and provides a strong alternative to using @@ERROR. It will be discussed in the next lesson. A large amount of existing SQL Server error handling code is based on @@ERROR so it is important to understand how to work with it.
@@ERROR @@ERROR is a system variable that holds the error number of the last error that has occurred. One significant challenge with @@ERROR is that the value it holds is quickly reset as each additional statement is executed. For example, consider the following code: RAISERROR(N'Message', 16, 1); IF @@ERROR <> 0 PRINT 'Error=' + CAST(@@ERROR AS VARCHAR(8)); GO
You might expect that when it is executed, it would return the error number in a printed string. However, when the code is executed, it returns: Msg 50000, Level 16, State 1, Line 1 Message Error=0
Note that the error was raised but that the message printed was "Error=0". You can see in the first line of the output that the error was actually 50000 as expected with a message passed to RAISERROR. This is because the IF statement that follows the RAISERROR statement was executed successfully and caused the @@ERROR value to be reset.
12-16
Implementing a Microsoft® SQL Server® 2008 R2 Database
Capturing @@ERROR into a Variable For this reason, when working with @@ERROR, it is important to capture the error number into a variable as soon as it is raised and to then continue processing with the variable. Look at the following code that demonstrates this: DECLARE @ErrorValue int; RAISERROR(N'Message', 16, 1); SET @ErrorValue = @@ERROR; IF @ErrorValue <> 0 PRINT 'Error=' + CAST(@ErrorValue AS VARCHAR(8));
When this code is executed, it returns the following output: Msg 50000, Level 16, State 1, Line 2 Message Error=50000
Note that the error number is correctly reported now.
Centralizing Error Handling One other significant issue with using @@ERROR for error handling is that it is difficult to centralize error handling within your T-SQL code. Error handling tends to end up scattered throughout the code. It would be possible to somewhat centralized error handling using @@ERROR by using labels and GOTO statements but this would be frowned upon by most developers today as a poor coding practice.
Handling Errors in T-SQL Code
Errors and Transactions
Key Points Many new developers are surprised to find out that a statement that fails even when enclosed in a transaction does not automatically rolled the transaction back, only the statement itself. The SET XACT_ABORT statement can be used to control this behavior.
Statement Terminating Errors vs Batch/Scope Terminating Errors Most common errors that occur when processing T-SQL are statement terminating errors not batch or scope terminating errors. This means that the statement in error is rolled back and execution then continues with the next statement following the statement in error. Note that this happens even when working within a transaction. For example, consider the following code: BEGIN TRAN; DELETE Production.Product WHERE ProductID = 1; PRINT 'Hello'; COMMIT; PRINT 'Hello again';
Note that when it is executed, the following output is generated: Msg 547, Level 16, State 0, Line 3 The DELETE statement conflicted with the REFERENCE constraint "FK_BillOfMaterials_Product_ComponentID". The conflict occurred in database "AdventureWorks2008R2", table "Production.BillOfMaterials", column 'ComponentID'. The statement has been terminated. Hello Hello again
12-17
12-18
Implementing a Microsoft® SQL Server® 2008 R2 Database
Note that both PRINT statements still execute even though the DELETE statement failed.
SET XACT_ABORT ON The SET XACT_ABORT ON statement is used to tell SQL Server that statement terminating errors should become batch terminating errors. Now consider the same code with SET XACT_ABORT ON present: SET XACT_ABORT ON; BEGIN TRAN; DELETE Production.Product WHERE ProductID = 1; PRINT 'Hello'; COMMIT; PRINT 'Hello again';
When executed, it returns: Msg 547, Level 16, State 0, Line 5 The DELETE statement conflicted with the REFERENCE constraint "FK_BillOfMaterials_Product_ComponentID". The conflict occurred in database "AdventureWorks2008R2", table "Production.BillOfMaterials", column 'ComponentID'.
Note that when the DELETE statement failed, the entire batch was terminated, including the transaction that had begun. The transaction would have been rolled back.
Handling Errors in T-SQL Code
12-19
Transaction Nesting Errors
Key Points Any ROLLBACK causes all levels of transactions to be rolled back, not just the current nesting level.
Transaction Nesting Errors SQL Server does not support nested transactions. The syntax of the language might appear to support them but they do not operate in a nested fashion. No matter how deeply you nest transactions, a ROLLBACK rolls back all levels of transaction. SQL Server does not support autonomous transactions. Autonomous transactions are nested transactions that are in a different transaction scope. Typically this limitation arises when trying to construct auditing or logging code. Code that is written to log that a user attempted an action is rolled back as well when the action is rolled back.
Nesting Levels You can determine the current transaction nesting level by querying the @@TRANCOUNT system variable. Another rule to be aware of is that SQL Server requires that the transaction nesting level of a stored procedure is the same on entry to the stored procedure and on exit from it. If the transaction nesting level differs, error 286 is raised. This is commonly seen when users are attempting to nest transaction rollback.
12-20
Implementing a Microsoft® SQL Server® 2008 R2 Database
Raising Custom Errors
Key Points Rather than raising system errors, SQL Server allows users to define custom error messages that have meaning to their applications. The error numbers supplied must be 50000 or above and the user adding them must be a member of the sysadmin or serveradmin fixed server roles.
Raising Custom Errors As well as being able to define custom error messages, members of the sysadmin server role can also use an additional parameter @with_log. When set to TRUE, the error will also be recorded in the Windows Application log. Any message written to the Windows Application log is also written to the SQL Server error log. Be judicious with the use of the @with_log option as network and system administrators tend to dislike applications that are "chatty" in the system logs. Note that raising system errors is not supported.
sys.messages System View The messages that are added are visible within the sys.messages system view along with the systemsupplied error messages. Messages can be replaced without the need to delete them first by using the @replace = 'replace' option. The messages are customizable and different messages can be added for the same error number for multiple languages, based on a language_id value. (Note: English messages are language_id 1033). Question: What do the DB_ID and DB_NAME functions return?
Handling Errors in T-SQL Code
12-21
Creating Alerts When Errors Occur
Key Points For certain categories of errors, administrators might wish to be notified as soon as these errors occur. This can even apply to user-defined error messages. For example, you may wish to raise an alert whenever a customer is deleted. More commonly, alerting is used to bring high-severity errors (such as severity 19 or above) to the attention of administrators.
Raising Alerts Alerts can be created for specific error messages. The alerting service works by registering itself as a callback service with the event logging service. This means that alerts only work on errors that are logged. There are two ways to make messages be alert-raising. You can use the WITH LOG option when raising the error or the message can be altered to make it logged by executing sp_altermessage. Modifying system errors via sp_altermessage is only possible from SQL Server 2005 SP3 or SQL Server 2008 SP1 onwards. Question: Can you suggest an example of an error that would require immediate attention from an administrator?
12-22
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 2A: T-SQL Error Handling
Key Points In this demonstration you will see: • • • • • • •
How to raise errors How severity affects errors How to add a custom error message How to raise a custom error message That custom error messages are instance-wide How to use @@ERROR That system error messages cannot be raised
Demonstration Steps 1.
2. 3. 4.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_12_PRJ\6232B_12_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 21 – Demonstration 2A.sql script file. Open the 22 – Demonstration 2A 2nd Window.sql script file. Follow the instructions contained within the comments of the script file.
Question: Why is the ability to substitute values in error messages useful?
Handling Errors in T-SQL Code
12-23
Lesson 3
Implementing Structured Exception Handling
Now that you have an understanding of the nature of errors and of basic error handling in T-SQL, it is time to look at a more advanced form of error handling. Structured exception handling was introduced in SQL Server 2005. You will see how to use it and evaluate its benefits and limitations.
Objectives After completing this lesson, you will be able to: • • • • •
Explain TRY CATCH block programming Describe the role of error handling functions Describe catchable vs non-catchable errors Explain how TRY CATCH relates to transactions Explain how errors in managed code are surfaced
12-24
Implementing a Microsoft® SQL Server® 2008 R2 Database
TRY CATCH Block Programming
Key Points Structured exception handling has been part of high level languages for some time. SQL Server 2005 introduced structured exception handling to the T-SQL language.
TRY CATCH Block Programming Structured exception handling is more powerful than error handling based on the @@ERROR system variable. It allows you to prevent code from being littered with error handling code and to centralize that error handling code. Centralization of error handling code also allows you to focus more on the purpose of the code rather than on the error handling in the code.
TRY Block and CATCH Block When using structured exception handling, code that might raise an error is placed within a TRY block. TRY blocks are enclosed by BEGIN TRY and END TRY statements. Should a catchable error occur (most errors can be caught), execution control moves to the CATCH block. The CATCH block is a series of T-SQL statements enclosed by BEGIN CATCH and END CATCH statements. Note that while BEGIN CATCH and END TRY are separate statements, the BEGIN CATCH statement must immediately follow the END TRY statement.
Current Limitations High level languages often offer a try/catch/finally construct. There is no equivalent FINALLY block in TSQL. There is currently no mechanism for rethrowing errors and only errors above 50000 can be thrown manually. This means that you cannot raise a system error within a CATCH block. Question: In what situation might it have been useful to be able to raise a system error?
Handling Errors in T-SQL Code
12-25
Error Handling Functions
Key Points CATCH blocks make the error related information available throughout the duration of the CATCH block, including in sub-scopes such as stored procedures run from within the CATCH block.
Error Handling Functions Recall that when programming with @@ERROR, the value held by the @@ERROR system variable was reset as soon as the next statement was executed. Another key advantage of structured exception handling in T-SQL is that a series of error handling functions have been provided and these functions retain their values throughout the CATCH block. Separate functions are provided to provide each of the properties of an error that has been raised. This means that you can write generic error handling stored procedures and they can still access the error related information.
12-26
Implementing a Microsoft® SQL Server® 2008 R2 Database
Catchable vs. Non-catchable Errors
Key Points It is important to realize that while TRY…CATCH blocks allow you to catch a much wider range of errors than you could catch with @@ERROR, you cannot catch all types of errors.
Catchable vs Non-catchable Errors Not all errors can be caught by TRY/CATCH blocks within the same scope that the TRY/CATCH block exists in. Often the errors that cannot be caught in the same scope can be caught in a surrounding scope. For example, you might not be able to catch an error within the stored procedure that contains the TRY/CATCH block, however you are likely to be able to catch that error in a TRY/CATCH block in the code that called the stored procedure where the error occurred.
Common Non-catchable Errors Common examples of non-catchable errors are: • •
Compile errors can occur such as syntax errors that prevent a batch from compiling. Statement level recompilation issues which usually relate to deferred name resolution. For example, you could create a stored procedure that refers to an unknown table. An error is only thrown when the procedure actually tries to resolve the name of the table to an objectid.
Question: Given the earlier discussion on the phases of execution of T-SQL statements, how could a syntax error occur once a batch has already started executing?
Handling Errors in T-SQL Code
12-27
TRY CATCH and Transactions
Key Points If a transaction is current at the time an error occurs, the statement that caused the error is rolled back. If XACT_ABORT is ON, the execution jumps to the CATCH block, instead of terminating the batch as usual.
TRY/CATCH and Transactions New SQL Server developers are often surprised that a statement terminating error that occurs within a transaction does not automatically roll that transaction back. You saw how SET XACT_ABORT ON was used to deal with that issue. When TRY/CATCH blocks are used in conjunction with transactions and SET XACT_ABORT is on, a statement terminating error will cause the code in the CATCH block to be executed. However, the transaction is not automatically rolled back. Note that at this point, no further work that would need to be committed is permitted until a rollback has been performed. The transaction is considered to be "doomed". After the rollback however, updates may be made to the database, such as logging the error.
XACT_STATE() Look at the code in the slide example. It is important to consider that when the CATCH block is entered, the transaction may or may not have actually started. In this example, @@TRANCOUNT is being used to determine if there is a transaction in progress and to roll back if there is one. Another option is to use the XACT_STATE() function which provides more detailed information in this situation. The XACT_STATE() function can be used to determine the state of the transaction: A value of 1 indicates that there is an active transaction. A value of 0 indicates that there is no active transaction.
12-28
Implementing a Microsoft® SQL Server® 2008 R2 Database
A value of -1 indicates that there is a current transaction but that it is doomed. The only action permitted within the transaction is to roll it back.
Handling Errors in T-SQL Code
12-29
Errors in Managed Code
Key Points SQL CLR Integration allows for the execution of managed code within SQL Server. High level .NET languages such as C# and VB have detailed exception handling available to them. Errors can be caught using standard .NET try/catch/finally blocks.
Errors in Managed Code In general, you may wish to catch errors as much as possible within managed code. (Managed code is discussed in Module 16). It is important to realize though that any errors that are not handled in the managed code are passed back to the calling T-SQL code. Whenever any error that occurs in managed code is returned to SQL Server, it will appear to be a 6522 error. Errors can be nested and that error will be wrapping the real cause of the error. Another rare but possible cause of errors in managed code would be that the code could execute a RAISERROR T-SQL statement via a SqlCommand object.
12-30
Implementing a Microsoft® SQL Server® 2008 R2 Database
Demonstration 3A: Deadlock Retry
Key Points In this demonstration you will see how to use structured exception handling to retry deadlock errors.
Demonstration Steps 1.
2. 3. 4.
If Demonstration 1A was not performed: •
Revert the 623XB-MIA-SQL virtual machine using Hyper-V Manager on the host system.
•
In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, click SQL Server Management Studio. In the Connect to Server window, type Proseware in the Server name text box and click Connect. From the File menu, click Open, click Project/Solution, navigate to D:\6232B_Labs\6232B_12_PRJ\6232B_12_PRJ.ssmssln and click Open.
•
Open and execute the 00 – Setup.sql script file from within Solution Explorer.
Open the 31 – Demonstration 3A.sql script file. Open the 32 – Demonstration 3A 2nd Window.sql script file. Follow the instructions contained within the comments of the script file.
Handling Errors in T-SQL Code
12-31
Lab 12: Handling Errors in T-SQL Code
Lab Setup For this lab, you will use the available virtual machine environment. Before you begin the lab, you must complete the following steps: 1. 2. 3.
4.
5. 6. 7.
On the host computer, click Start, point to Administrative Tools, and then click Hyper-V Manager. Maximize the Hyper-V Manager window. In the Virtual Machines list, if the virtual machine 623XB-MIA-DC is not started: •
Right-click 623XB-MIA-DC and click Start.
•
Right-click 623XB-MIA-DC and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears, and then close the Virtual Machine Connection window.
In the Virtual Machines list, if the virtual machine 623XB-MIA-SQL is not started: •
Right-click 623XB-MIA-SQL and click Start.
•
Right-click 623XB-MIA-SQL and click Connect.
•
In the Virtual Machine Connection window, wait until the Press CTRL+ALT+DELETE to log on message appears.
In Virtual Machine Connection window, click on the Revert toolbar icon. If you are prompted to confirm that you want to revert, click Revert. Wait for the revert action to complete. In the Virtual Machine Connection window, if the user is not already logged on: •
On the Action menu, click the Ctrl-Alt-Delete menu item.
•
Click Switch User, and then click Other User.
•
Log on using the following credentials:
12-32
Implementing a Microsoft® SQL Server® 2008 R2 Database
i. User name: AdventureWorks\Administrator ii. Password: Pa$$w0rd 8. From the View menu, in the Virtual Machine Connection window, click Full Screen Mode. 9. If the Server Manager window appears, check the Do not show me this console at logoncheck box and close the Server Manager window. 10. In the virtual machine, click Start, click All Programs, click Microsoft SQL Server 2008 R2, and click SQL Server Management Studio. 11. In Connect to Server window, type Proseware in the Server name text box. 12. In the Authentication drop-down list box, select Windows Authentication and click Connect. 13. In the File menu, click Open, and click Project/Solution. 14. In the Open Project window, open the project D:\6232B_Labs\6232B_12_PRJ\6232B_12_PRJ.ssmssln. 15. In Solution Explorer, double-click the query 00-Setup.sql. When the query window opens, click Execute on the toolbar.
Lab Scenario In this lab, a company developer asks you for assistance with some code he is modifying. The code was written some time back and uses simple T-SQL error handling. He has heard that structured exception handling is more powerful and wishes to use it instead. If time permits, you will also design and implement changes to a stored procedure to provide for automated retry on deadlock errors.
Handling Errors in T-SQL Code
Exercise 1: Replace @@ERROR based error handling with structured exception handling Scenario In this exercise, you need to modify his code to use structured exception handling. The main tasks for this exercise are as follows: 1. 2. 3.
Review the existing code Rewrite the stored procedure to use structured exception handling Test that the procedure
Task 1: Review the existing code •
Review the existing code in the procedure Marketing.MoveCampaignBalance
Task 2: Rewrite the stored procedure to use structured exception handling •
Rewrite the stored procedure to use structured exception handling, calling the rewritten stored procedure Marketing.MoveCampaignBalance_Test.
Task 3: Test the stored procedure •
Test that the stored procedure still works as expected Results: After this exercise, you have created a stored procedure that uses structured exception handling.
12-33
12-34
Implementing a Microsoft® SQL Server® 2008 R2 Database
Challenge Exercise 2: Add deadlock retry logic to the stored procedure (Only if time permits) Scenario In this exercise, the operations team have mentioned that the same stored procedure also seems to routinely fail with deadlock errors. To assist them, make further modifications to your new procedure to add automatic retry code for deadlock errors. The main tasks for this exercise are as follows: 1. 2.
Modify the code to re-try on deadlock Test the stored procedure
Task 1: Modify the code to re-try on deadlock •
Modify the code for the Marketing.MoveCampaignBalance_Test stored procedure to re-try on deadlock up to five times
Task 2: Test the stored procedure •
Test that the procedure still works as expected Results: After this exercise, you have modified a stored procedure to automatically retry code for deadlock errors.
12-35
Module Review and Takeaways
Review Questions 1. 2. 3.
What is the purpose of the SET XACT_ABORT ON statement? Why should retry logic be applied to deadlock handling? Give an example of an error that retries would not be useful for.
Best Practices a)
b)
When designing client-side database access code, do not assume that database operations will always occur without error. Instead of a pattern like: • Start a transaction • Do some work • Commit the transaction Consider instead a pattern like: • Reset the retry count • While the transaction is not committed and the retry count is not exhausted, attempt to perform the work and commit the transaction. • If an error occurs and it is an error that retries could apply to, retry step b). Otherwise, return the error to the calling code.
12-36
Implementing a Microsoft® SQL Server® 2008 R2 Database