This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Interwoven TeamSite is the premier solution for enterprise-level web content management. Utilizing award-winning, patented technology, TeamSite is capable of hosting high-level enterprise applications, from intranet services to mission-critical websites. It interfaces seamlessly with a host of portal platforms and enterprise applications, including IBM, BEA, SAP, Plumtree, Siebel, and PeopleSoft. This is just one of the reasons why General Electric, Siemens, Cisco, and other blue-chip companies use TeamSite. The Definitive Guide to Interwoven TeamSite leads you through the process of defining your enterprise content management system (ECMS) requirements; designing the solution; implementing a robust, reliable, and efficient system; and delivering that system to your end users. We introduce you to ECMS concepts as well as the Interwoven TeamSite architecture, TeamSite’s key features, and the detailed implementation that you will need to understand in order to carry out each aspect of your ECMS project. We also provide guidelines for many different roles within an enterprise, including technical and nontechnical roles. The book uses the Rational Unified Process (RUP) as a software development methodology and teaches how to incorporate key RUP documents and principles while building an ECMS, upgrading an existing system, or just enhancing your current setup. We have many years of experience in the trenches performing ECMS implementations, and we have built this book around a case study of a fictional but typical financial-services firm, thereby exposing all the features of TeamSite. We will guide you from the project inception phase to the transition phase and help you avoid common pitfalls in the process. Whether you are a newcomer to ECMSs or a seasoned professional, you can use this book to achieve your goals. Brian Hastings and Justin McNeal
Join online discussions:
SOURCE CODE ONLINE
forums.apress.com
www.apress.com
FOR PROFESSIONALS BY PROFESSIONALS ™
Interwoven TeamSite
Dear Reader,
ISBN 1-59059-611-0 90000
Shelve in Web Development/Content Management Systems
6
89253 59611
1
Hastings, McNeal
The Definitive Guide to
Interwoven TeamSite Harness the power of this enterprise web content management platform
Brian Hastings and Justin McNeal Foreword by Russell Nakano, Cofounder and former Principal Consultant, Interwoven
9 781590 596111
User level: Beginner–Advanced
this print for content only—size & color not accurate
This book is dedicated to my mother, Shirley Ruth McNeal (March 11, 1946–July 3, 2004). Mom, you were the strongest woman I have ever known, you taught me about hard work, you gave me my strength, and you taught me what love and sacrifice truly mean. I can never thank you enough for everything you have given me, and I hope you are proud of the man I have become. —Justin McNeal This book is dedicated to my family, who has worked together to form who I am. To my parents, Ron and Vickie Hastings, who spent the time and effort to provide me with a sense of morality and good judgment. To my grandparents, Paul and Tracy Piper, who always made sure I had what I needed, and to my uncle, Cletus Piper—without him introducing me to computers at such a young age, I would not have had a chance to write this book. I would also like to dedicate this book to my wife, Muffy Hastings, for supporting me through the long hours that this project forced me to undertake. —Brian Hastings
digital flood is upon us. Content inundates us. It begins as bits of content swirling everywhere—a document, an image, a written corporate procedure, a web page, or an email. The binary mist mixes, combines, and rains down on us. It pools in laptops, on desktops, and in server farms. Creeks and streams meander to corporate reservoirs. Modern civilizations prosper and thrive as continents surrounded by oceans of content. But content is a resource that must be actively managed in order to safeguard its freshness. Left unmanaged, it seeps away, picks up funny odors, or simply becomes stale. It is no longer adequate to store content in odd-sized barrels and ladle out more to anyone who needs some. Interwoven TeamSite is a modern content distillery. It assists content creators who brew new tonics. You can combine an essence from here with an ingredient from there with the help of customized content-entry interfaces. Workflow enables the sequencing, routing, notification, and coordination between the stages of creation, review, post-processing, and dissemination. Search adds cohesion to the repository by vastly increasing individual productivity and by fostering the organizational reuse of existing content. Underneath it all hums a controlled storage repository where standard components interact with custom modules to bring carefully refined business logic to life. With this book, Brian Hastings and Justin McNeal have assembled a practical guide to help information technologists and business analysts grasp the essential concepts on which content management systems are built. Having constructed systems many times themselves, the authors know that practitioners learn best from case studies and true-to-life examples. For example, what is important to know, and what are common mistakes? What factors convince an organization to embark on a journey to bring its content under control? What principles and insights from the Rational Unified Process can be applied to content management? The Definitive Guide to Interwoven TeamSite answers these questions and more. The authors bring an impressive breadth of experience to this book. On one hand, a system architect can find example code for presentation templates and workflow tasks to give them a concrete understanding of the technical skills required to complete the intensive Construction phase of the project and give them a glimpse of possible features. On the other hand, in Chapter 18 a project manager will find suggestions for reaching out to the internal community during the Transition phase. Activities could include sponsoring information sessions, identifying potential solution champions within the organization, and devising a detailed training strategy. Content management has come a long way over the past ten years, compared to earlier times when it seemed that only software writers cared about building repositories and versioning content. The flood of content has risen to new heights. With this book, we all become better navigators. Russell Nakano Former principal consultant and cofounder of Interwoven, author of Web Content Management: A Collaborative Approach (Addison-Wesley, 2001), and current president and cofounder of Nahava
Sunnyvale, California June 2006
xxv
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxvi
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxvii
About the Authors ■BRIAN HASTINGS is the vice president and cofounder of Rational Solutions, an Internet technologies and web services consulting company based in St. Louis, Missouri. Brian has been working with Interwoven TeamSite for seven years and has worked as a CMS implementation specialist, developer, and architect for clients such as AG Edwards, MasterCard International, and FedEx. He is currently working toward his Sun Certified Enterprise Architect certification because he knows that when working within an IT field, it is critical to continue sharpening your skills and developing new ones. This is why Brian has written this book—to help individuals as well as enterprises continue to grow. ■JUSTIN MCNEAL is the president and cofounder of Rational Solutions, an Internet technologies and web services consulting company based in St. Louis, Missouri. He has the MCP, CCNA, I-Net+, CIW, Network+, and e-Biz+ certifications and is IBM certified as an E-Business–Solution Technologist. Justin has been a full-time consultant and technical trainer for ten years, during which he has consulted for major Fortune 500 companies such as IBM, FedEx, and Elsevier. Justin has been deeply involved with ECMSs with an emphasis on Interwoven TeamSite for more than seven years. During that time, Justin has worked as a lead analyst on CMS implementations and has conducted and led numerous technical training workshops, including teaching ECMS and TeamSite internationally for MasterCard International. Justin was one of five people chosen as a subject matter expert for CompTIA’s e-Biz+ test analysis and helped develop and conduct the final analysis of the e-Biz+ 2004 test. Justin enjoys the enrichment that technology, when implemented properly, can have on businesses and on people’s lives, and he loves discovering new ways to improve business processes. In his spare time, Justin enjoys hiking, fishing, and traveling.
xxvii
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxviii
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxix
About the Technical Reviewer ■TOM SHELL is an enterprise services executive for Interwoven, the leading provider of ECM solutions, located in Sunnyvale, California. He has more than twelve years of experience developing, selling, and supporting customer business solutions. In his eighth year with Interwoven, Tom brings his content management expertise and his broad mix of analysis, design, and implementation experience to his work. Most of all, Tom is deeply committed to applying technology to everyday challenges. This technologist and proudly self-proclaimed geek lives in Ann Arbor, Michigan, with his wife, three sons, and dog.
xxix
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxx
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxxi
Acknowledgments F
irst and foremost, we would like to thank our creator and heavenly father; without him, nothing is possible. We would also like to give special thanks to our friends and family for putting up with our long hours of work, for dealing with our endless excuses for skipping activities to work on the book, and for listening to no less than 10,000 iterations of “We’re almost done—only a couple of more weeks.” We would also like to thank our current colleagues who supported us and listened to us talk about the book incessantly, as well as our past colleagues who supported us during our voyages in the IT world. Thanks to everyone who allowed us to use their name in the finished book—you know who you are! Special thanks go out to Tom Shell from Interwoven for his considerable efforts and expertise as our technical reviewer. A thank you to Sunil Menin (senior product manager, web content management, at Interwoven) for his expertise, tireless conference calls, latenight emails, and can-do attitude. We would also like to thank Tom Brauch for his tremendous creative and visual design expertise in the fabrication of our fictitious company (FiCorp) for the case study. We would like to thank Mr. Russell Nakano for really making this all possible. If he had not founded Interwoven, then we certainly would not have written this book. Thank you again, Mr. Nakano—you are an inspiration. Finally, thank you to anyone who we might have inadvertently forgotten (P.S. let us know for the second edition)!
xxxi
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxxii
6110_Ch00_FINAL
7/2/06
1:00 PM
Page xxxiii
Introduction T
he Definitive Guide to Interwoven TeamSite is the first comprehensive book about this enterprise-level content management system (CMS). Divided into six parts, it guides you through the Interwoven TeamSite architecture, key features, and detailed implementation. The book presents material using the Rational Unified Process as a development framework and project methodology. Each part of the book introduces the concepts and TeamSite features that you will need to understand in order to carry out each aspect of your TeamSite project. This book provides a complete road map for creating a working implementation and offers numerous visual guides by painstakingly covering the project process. We also include a crucial case study of a fictitious financial services firm called FiCorp. Throughout the book, we share the key development strategies, deployment principles, best practices, and insider tips that we have gathered over the many years of working in enterprise CMS environments at various Fortune 500 companies. We also discuss future Interwoven product releases, including LiveSite, MetaTagger, and SalesSite. Finally, we share insight into the future product vision of Interwoven and TeamSite.
In the Beginning . . . (Our CMS Beginning) When we were first exposed to TeamSite, we never thought what we were doing had a name. Content management was what they were calling it. The plan for how it should work seemed valid, but we quickly found out we had been using a tool called TeamSite that had been interestingly enough interwoven into the existing custom systems. We were given the task of upgrading the system and had no idea how hard it was going to be. You see, the team was fairly small when TeamSite was implemented, and they had created a demo to show management how the tool could benefit the company. When they showed off the demo, management decided they would start using the demo implementation for live content. From that day forward, they started building on top of the demo. Like most teams, the implementation had been done with little technical knowledge about how TeamSite worked or what it could really do. So, the team took their years of experience in Perl development and web development and did what they knew best. They tied a huge amount of custom code to the base functions provided by TeamSite. This worked wonderfully—until the upgrade came along. We soon found out that a lot of the customizations could now be handled using TeamSite base functionality. The problem was with all of the current customizations, the users had become lazy. They wanted all the new functionality that TeamSite could give them while keeping the existing functionality they been given through customizations. IT management was pushing hard for using functionality that came right out of the box, and marketing was pushing for customization. This struggle took almost a year to resolve while at the same time the new system was being built. After a lot of sleepless nights and hard work, both IT and marketing ended up with a great system.
xxxiii
6110_Ch00_FINAL
xxxiv
7/2/06
1:00 PM
Page xxxiv
■INTRODUCTION
The moral of this story is that if they had been able to access some good inside information about how to build the system, the team would not have built so much custom code around it. There is nothing wrong with custom code, but you should try to limit the amount built by fully utilizing functionality provided to you by TeamSite. If you do decide to build custom code, make sure you build it in the most intelligent and extensible way. Custom code should not be the deciding factor for not upgrading to the latest and greatest TeamSite version. Once you fall behind in versions, it can become difficult to catch up (and many times if you are building it, Interwoven is planning for it in a future release). So, when we started in content management, there really weren’t any good references available for someone to learn more about this technology. This means we made a lot of mistakes; however, you don’t have to make the same ones. After reading this book, you will be armed with the knowledge needed to build a successful CMS. This means you will be able to build a system that makes life easy for its users, and you will be able to upgrade to the next version of TeamSite with as little effort as possible.
Why You Should Buy This Book This book is the only book of its kind available on the market today. In fact, no other source exists for self-paced home training on TeamSite, LiveSite, and MetaTagger. This book will teach you what you need to know about Interwoven TeamSite including best-practice implementation steps.
Who Is This Book For? We, as TeamSite consultants, wrote this book specifically for enterprise CMS (ECMS) professionals. We designed this book to aid technical individuals as well as nontechnical individuals. This book contains a great deal of hands-on information that we gathered from years of implementing CMSs in Fortune 500 companies. The goal of this book is to educate all those involved in the product evaluation, business analysis, technical sales, project management, and ECMS architecture/development of a CMS implementation. For example, technical individuals such as software developers will gain detailed knowledge of Interwoven code modules, which will aid in creating custom code. For system analysts and architects, we have laid out the overall picture of what a CMS should look like, how it should be designed, and how it should be implemented. This will help when putting together the requirements and designing the overall enterprise solution. If you’re a technical salesperson, you will gain insight into how a CMS is implemented. Whether you are selling the Interwoven system or another CMS, you will understand the products available from an implementation viewpoint. For nontechnical individuals such as project managers, this book provides a great up-front look at creating a project plan. In addition, business analysts will learn some of the ways to solve business needs and will learn the questions that need asking. No matter what your role is in an organization, you will find this book useful. We also sincerely hope you enjoy this book as much as we enjoyed writing it for you.
6110_Ch01_FINAL
7/2/06
1:01 PM
PART
Page 1
1
■■■
Introducing the Rational Unified Process The Rational Unified Process (RUP) is a process framework for developing software engineering projects. The RUP is comprised of four phases, namely, the Inception, Elaboration, Construction, and Transition phases. Within each phase, disciplines organize activities by the nature of those activities. Disciplines group the activities with the roles that perform them and the artifacts that must be produced for each workflow. Disciplines are guidelines that instruct you on how to perform specific activities. Some disciplines include the Requirements, Analysis and Design, and Implementation disciplines. Roles are specific job functions with specific responsibilities in the RUP. Some examples include the requirements reviewer, the requirements specifier, and the system analyst. Artifacts are work products that are developed during the project and used to convey information about the project. The RUP focuses on six specific best practices for software development. These are best practices not so much because you can precisely quantify their value but, rather, because they are observed to be commonly used in the industry by successful organizations. These best practices are as follows: Develop software iteratively: This best practice talks about the impossible goal of defining all the requirements before starting to develop the software. What this teaches is that a better approach is to define, develop, and deploy the software iteratively, with each iteration involving end user and project stakeholder feedback. This allows issues to be caught early in the process rather than later and for those issues to be corrected and the software to be refined along the way. Manage requirements: The RUP describes how to elicit, define, and derive requirements that are agreed on by project stakeholders. Requirements are broken down into use
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 2
case requirements and functional requirements that are not specific to use cases and that seek to maintain traceability between those gathered and derived requirements. Use component-based architectures: The RUP focuses on early proof-of-concept development of a prototype. Development is empowered to use new and existing software components that have a clearly defined purpose. Using components in software helps ensure software reuse. Visually model software: The process describes how to visually model software, thereby reducing ambiguity between analysis, design, and development. The industry standardized Unified Modeling Language (UML) is utilized as the foundation for successful modeling. Verify software quality: Continually verify software quality during each iteration of the development. The quality of a software product should be reviewed in regard to reliability, functionality, and performance. Control changes to software: The ability to manage change is being able to accept those changes (which are inevitable) and manage those changes across development efforts. The process describes how to track, control, and monitor changes to ensure successful iterative development. As is often the case in organizations where the RUP is used, the full RUP is never realized or fully implemented; this is indeed the case in this book. The RUP does not have to be implemented fully to be effective and to add significant value. This book is not a book that will teach you everything about the RUP, and you will not be a RUP expert after reading this book. You will, however, understand more about the RUP, and you will understand how to apply that process to a content management system (CMS) implementation and to your future CMS projects.
6110_Ch01_FINAL
7/2/06
1:01 PM
CHAPTER
Page 3
1
■■■
What Is Content Management? H
ello, and welcome to The Definitive Guide to Interwoven TeamSite. You have made an extremely important investment for yourself and your company that will pay large dividends for your content management system (CMS) implementation. We have written this book hoping to assist others in tackling the daunting task of implementing an enterprise CMS (ECMS). We have drawn upon our combined decade of experience implementing and managing the Interwoven CMS, gathering tips and tricks learned along the way, and combined the knowledge into one source, this book. With the help you’ll find in this book, you will have fewer sleepless nights and a more successful CMS implementation.
What Is Content? Before we speak about content management, you should take a moment to better understand the concept of content itself. Only in this way can you more fully understand content management, the impact it has had in modern-day e-business, and the effectiveness, usefulness, and vital role of CMSs. We define content in an organization as any organizational informational asset that exists in an electronic medium. Although it can be argued that any physical information resource can be classified as content, in the context of this book we will not consider any source as content until it exists in an electronic form. Some typical examples of content include e-books, manuals, publications, web pages, video files, music, instructional material, promotional material, help text…the list goes on and on. You can classify anything as content for an organization if it fits into all of the following sections.
Content Has a Classification Type Typically the classification type is organizational; in other words, content can be categorized by an organizational unit such as marketing content, legal content, general content, privacy content, and so forth. A company that performs only legal services may use contracts, wills, deeds of trust, billing invoices, summons, and other legal type content, while a medical publishing company may depend upon drug data sheets, patient handouts, medical books, and drug news as content.
3
6110_Ch01_FINAL
4
7/2/06
1:01 PM
Page 4
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Content Has a File Type/MIME Type Any piece of content has an associated file type, that is, the file extension that is tied to a particular program or standard. On the Windows platforms, file extensions tell the operating system the program or application to launch to service the specific file type. For example, an image file will generally have an extension such as .gif, .jpeg, .jpg, .bmp, .png, or .tif. In a CMS, file types are important to drive storage placement and delivery requirements, as well as to determine how certain files will be viewed. For example, in CMSs, the file type may control whether certain files are displayed in an iframe, a separate window, or the current window.
Content Has Metadata Content has data attributes that describe it. These data attributes are described collectively as metadata. Metadata is used for many functions within a CMS: Indexing data for search-related capabilities: You can intelligently add keywords to each content type via a defined taxonomy. A taxonomy is an intelligent mapping of taxons (the highest-level grouping) and taxa (subgroups under each taxon) that are unambiguous and, when taken together, consider all the possible values. For example, have you ever played the guessing game 20 Questions? This game usually starts with someone thinking of something that you then have to guess based on the responses of the 20 questions you get to ask. Usually you start by asking something similar to “Is it an animal, vegetable, or mineral?” If the other person playing with you answers your question by saying “Animal,” then you may ask “Is it a dog, cat, or human?” You will then continue until you guess the subject or you run out of questions. Each of those high-level categories of Animal, Vegetable, or Mineral would be a taxon. The taxa would then stem from the higher-level taxon; for example, the taxa for the Animal taxon might be Dog, Cat, and Bird. Clustering on metadata and exposing that clustering to search: For example, by clustering metadata, you can list all books or all documents from the marketing department or list all the content about a certain topic. With clustering metadata, you can capture important information such as the author name or publish date. When exposed to a search engine, you then provide the ability to list all books published by a certain author or, for example, all books published from 1995 to 2001. You could also capture the subject for specific topicbased searching. By clustering with metadata, you can accomplish advanced searches such as retrieving all the books published by a certain author, of a certain subject, and published from 1995 to 2001. By having clustering data available, you can group search results and create dynamic drill-down capabilities by dynamically rebuilding the search query. Automating system-created content using an intelligent method: You can create new and complete content dynamically, where pieces of content are combined or aggregated into an entirely new piece of content based on system requirements or at run time. If content authoring is performed where content is created at the smallest possible level, then this is possible but still difficult to accomplish. Synthesizing content is especially useful for content such as marketing collateral and presentations that should be shared across an organization or several diverse sales teams.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 5
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Determining when data should be archived and when data should be removed: These topics are often overlooked in a CMS application. By setting metadata attributes and associating those attributes with content, you can manage the demotion or deletion of content. Additionally, if your corporation has retention standards, you can programmatically control when content is archived and even where the archived content is stored. Determining when content is promoted or published: By associating metadata with your content, you can program that content’s promotion date and time. A business content review process will control when the content has reached an acceptable publishing state, but why not also control when that content is made live?
Content Requires Storage Simply put, content takes up space, which is typically database space or space on a file system. Some examples of storage space used to store content include network-attached storage devices such as those provided by EMC, relational databases such as Oracle or SQL Server, and file systems such as the Unix file system or NTFS. No matter what the content or the content’s type, you must have adequate space in which to store it. All content has a certain number of bits and bytes that compose it. This collection of bits and bytes takes up disk space and must be accounted for in a CMS.
Content Has a Purpose Not Related to the CMS Configuration files, templates, or any other files that are used by the CMS should not be considered actual content. Basically, any file that does not provide value to a consumer of the content is not considered content. People who use content are also known as content consumers.
Defining Content Management Content management is the organizing, directing, controlling, manipulating, promoting, and demoting of content and/or digital informational assets within an organization. Promoting content means deploying content from the authoring environment to the content delivery environment, which is usually a web server. Demoting content means removing or rolling back content from the content delivery environment to the content authoring environment. A CMS manages those various pieces of content described earlier. This is extremely important for organizations because as a business grows, so do the complexities of its content. Businesses have hundreds, thousands, and sometimes millions of pages of content, and with the overwhelming flood of this content, they need help! A CMS is much more than an off-theshelf piece of software that a company purchases and configures for its specific needs. To understand content management, you not only have to have a view of the proverbial forest but also need to know where each tree belongs. Implemented properly, content management and a CMS can do the following: • Improve delivery time from content creation to content promotion • Improve content quality
5
6110_Ch01_FINAL
6
7/2/06
1:01 PM
Page 6
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
• Reduce the cost of managing an organization’s global or departmental content • Reduce redundancies in content and reduce human error • Eliminate orphaned files and broken links • Automate content notifications and the review process • Enforce legal and branding standards • Improve content visibility and accountability throughout the delivery chain Implemented poorly, a CMS system can be a virtual money pit, where no efficiencies are gained and many hours are wasted on a project that may provide little to no value to end users or consumers. You must understand two important facts regarding a CMS: • A CMS will not improve your business process. You will have to expend some analysis cycles and rethink existing processes and procedures to enhance your business flow. Only by improving your existing process will you realize the true benefit of content management. • A CMS is not free. Skipping the cost of the solution itself, you will be burning hours to implement a system. Once everyone starts to use the system and find out what the system can do, they will want more. This will also lead to additional implementations and maintenance. This is a topic we cannot reinforce strongly enough: setting up a CMS is more expensive than just building an initial website. You must be willing to accept the initial cost of building and deploying a CMS before you experience a return on investment. But you will have to trust that all of the hard work, project hours, and budget allotments will pay off in the end.
Recognizing the Business Need for Content Management In the early days of the Internet, the need for sophisticated content management mechanisms did not exist. At that time, all you had were a few engineers who were more interesting in transmitting technical and scientific data than they were in using the Internet as the incredible ________. Well, you fill in the blank. These days the Internet is commonly relied upon as a marketing vehicle, supply-chain management channel, and distribution channel, among many other uses. Indeed, the Internet as a medium or transportation vehicle for content has come a long way. The bulk, complexity, and necessity of content management were not a problem in the past. The techie renegades did not worry about how their content would look, about when it would reach the other end of the world, or about how their company image (or brand) may be injured or helped by how their data was arranged or how fresh or stale that content was; they didn’t even care about performing any specialized integration with unique application or business logic servers. When companies came around to using the power of their content, you still found the renegades who knew the technology inside and out and the business folks who would try to explain exactly what they wanted for their content. These content renegades were in high demand because they were the only ones in an organization who understood the technology,
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 7
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
and therefore they were the only ones who could update that content. This was a problem in the past because the content was so tightly coupled with the technology. In today’s world, the needs of corporations are complex, and equally complex, if not more so, are their content (basically the data that makes up their Internet, extranet, and intranet sites) management needs.
GETTING TO KNOW SOME TERMS An Internet site is any website that is publicly available over the World Wide Web. The Internet uses the public telecommunication networks infrastructure, HTTP over TCP/IP, and any computer that is connected to it to connect billions of users around the world. A company’s Internet site usually contains marketing content and other nonconfidential information and may also serve to provide an entry point to secured site areas. An extranet site is any website that is privately available and serves to connect users of the extranet site to the company that manages the extranet site. An extranet site uses Internet technologies and requires security and privacy that is usually managed via encryption, passwords, and tokens, as well as uses virtual private networks. An intranet site is any website that uses Internet technologies and is available only to internal resources. An intranet is usually available only to employees of an organization and is usually used to provide information to those employees or to facilitate their working environment.
As the complexities of content has grown, the complexities of available tools required to access, implement, and modify that content have been reduced, allowing the ability for the true content experts (the marketing, sales, administrative, and managerial staffs) to better manage that content. The business need for content management cannot be explained with a one-size-fits-all mentality; rather, it is driven by your own organization’s need. Generally, all companies need some content management. However, the level of content management for individual companies will vary greatly. Your company’s need for a CMS is determined by the following factors.
Amount of Dynamic and Static Content Does your organization have several thousand pieces of content? If your organization has a large amount of content where the storage and management of that content is complex, then a CMS will help you. Managing many pieces of content and making that content available for content consumers can be a tricky undertaking. We typically find that the more discrete pieces of content an organization has, the more it must increase its information technology (IT) staff or administrative staff to manage this content. Having a CMS means never having to worry about where documents and content are stored or how to retrieve that content.
Complexities of Your Content Complex content needs demand a CMS. Content built dynamically from many sources and content that depends on other content can lead to content management headaches. Without a CMS, content updates are forgotten, navigation is updated incorrectly, and content quality suffers.
7
6110_Ch01_FINAL
8
7/2/06
1:01 PM
Page 8
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Frequency of Updates or Additions to Your Content When content changes frequently, typically more than once per day, a CMS is imperative. Humans simply cannot keep up with frequent content updates. A good rule of thumb is that if your content has more than 20 to 30 updates a month and one or more of the other factors apply, then you need a CMS. A CMS can contain the intelligence to know about dependent pieces of content and can alert the author to make those required changes. Without this system in place, controlling content updates can be confusing and time-consuming. A CMS will also streamline the authoring process with authoring tools such as what-you-see-is-whatyou-get (WYSIWYG) editors.
■Note A WYSIWYG editor is a piece of software that allows content creation without the use of complicated markup characters and shows the content author exactly how the finished content will look while the authoring is taking place. For example, creating a Microsoft Word document in Print Layout mode gives you a WYSIWYG view. Some popular Hypertext Markup Language (HTML) WYSIWYGs include Microsoft FrontPage and Adobe GoLive.
Archival and Retention Requirements of Your Content CMSs can facilitate archival requirements by housing all the content in one location. Interwoven TeamSite has a concept called an edition, which is a snapshot of all web content at a point in time. With editions, entire content stores can be moved to tape or another backup media or retained in the CMS itself. If editions are maintained in the CMS, browsing and retrieving the content is as simple as selecting the edition and browsing to the appropriate content. By providing snapshots of content retention, requirements are easily met. For example, we have worked with clients who had to meet a U.S. Securities and Exchange Commission (SEC) retention requirement of three years. This meant at any time the SEC could mandate that this company’s content be reproduced exactly as it was originally displayed anytime during the past three years. With editions, solving this requirement was easy. (You’ll learn more about editions in Chapter 7.)
Statutory or Legal Requirements for the Use of Your Content Do you have requirements to always publish content in two formats, one for the Web and one for print or optical media? Why continue to produce content in two formats manually when a CMS can produce this content automatically? Addressing legal issues—such as the Health Insurance Portability and Accountability Act (HIPAA), which requires you to always include legal approval for each content deployment—is easy with a CMS and a workflow.
Requirements of External Sources to Supplement Your Content Content feeds from external entities, whether they are news feeds, stock feeds, or weather updates, can greatly enrich content and boost the usability of that content. A CMS can automatically detect these types of external data feeds, notify the appropriate content approvers, and begin the deployment process. Without a CMS, once again you rely on human interaction, which is error prone and lacks the efficiency of a CMS.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 9
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Delivery Requirements for Your Content, Including Different Presentation Types and Data Layouts A CMS allows you to publish content in a variety of presentation types. By separating your data and presentation in a CMS, you can support clients that need print-ready documents, web-ready documents, and even documents for personal digital assistants (PDAs) or cell phones. By keeping the data separate, you can combine multiple data sources into an aggregate document. Think about this—if you have a single document or piece of content that requires multiple content authors, the CMS is the ideal solution. For example, company XYZ is producing a product catalog for a new gadget being produced in the company. Company XYZ has marketing authors for the marketing section of the product catalog, sales authors for the price and licensing section of the catalog, and product engineer authors for the technical specification sections of the product catalog. By maintaining each section as a discrete piece of content, all authors can work collaboratively at the same time on what will become the finished product catalog. When the content components are ready to be published, the CMS can facilitate the approval process and bind the components together into the final product catalog. The CMS can even generate the product catalog into any number of supported output formats such as Portable Document Format (PDF), HTML, or Wireless Markup Language (WML). Figure 1-1 depicts how content can be turned into the final format by combining a presentation with the data. . .. ...
. .. ...
Component Bundle
. .. ...
Content Aggregation and Generation Within the CMS
PDA
Web Content Document
Figure 1-1. Delivery requirements and componentizing of content in a CMS
9
6110_Ch01_FINAL
10
7/2/06
1:01 PM
Page 10
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Authoring Requirements for Your Content, As Well As the Audience of That Content Content authoring without a CMS is limited to technical authors and authors with specialized authoring tools. Dependent on the output format, an author will need HTML editors such as FrontPage or Macromedia Dreamweaver, PDF creation programs such as Adobe Acrobat, and any number of others depending on the publishing requirements for your content. Authors without a CMS will also need specialized knowledge of the content, knowledge of how to use the authoring tools, and technical knowledge of markup characters for some output formats. All this specialized knowledge means hiring people who have technical skills and who can be expensive. Maintaining and supporting several different authoring tools across your enterprise is also expensive. Licensing and deploying these authoring programs can equate to thousands of dollars per year. With a CMS, you eliminate these significant costs. CMSs have built-in editing and content authoring capabilities that eliminate the need for specialized editing and authoring tools. Additionally, the CMS editors have a user-friendly interface that anyone with basic word processing skills can use. Within a CMS you can also intelligently specify a specific organizational glossary where common terms can be reused automatically. This means you will provide content consumers with a consistent corporate voice, if you will, thereby reducing confusion with interpreting your content.
Metadata and Searching Requirements for Your Content ECMSs such as Interwoven TeamSite have connectors to metadata creation programs. One such world-class solution is MetaTagger, also from Interwoven. Using these applications to manage your content, you can automatically associate the correct metadata with all your content. By having a consistent metadata schema, search engines are more productive, and indexing overhead is greatly reduced.
Syndication and Deployment Requirements for Your Content Syndication is the process of integrating external content into your own company’s content through a paid subscription. Without a CMS, this can be a manual process. The person responsible for content syndication will either be notified of content updates via an email or be notified by actually checking the source location constantly for content updates. If this person is sick or on vacation, content updates are missed, meaning that the subscription fees are not being utilized to their full potential. Once content updates are identified, the person responsible for adding the updated content usually has to copy the updated content and then use a predefined content authoring method to integrate that newly acquired content into their own content. Using this manual process, content updates are frequently missed, or worse, content is outdated before it even reaches the destination location. Often, a manual process such as this requires the newly acquired content be reformatted extensively to match the destination’s style requirements. A CMS eliminates these problems. Content updates can be performed automatically within the CMS. The CMS can be notified by the subscription service and can intercept the email to ingest the content update. The CMS, once notified, can integrate the content automatically and appropriately. The CMS can then route the content for approval by a predefined human user of the CMS. When the content updates are approved, the content can be deployed to the destination location. By working with a subscription service, you can define the ingestion format ahead of time, and the CMS can programmatically modify the content to match the destination format.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 11
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Separating Data and Presentation Separating data and presentation is a core concept and practice in regard to a CMS. We touched on that it is essential to separate your data completely from the presentation. Doing this allows your business solutions to be flexible. By separating or decoupling the data and presentation, you greatly improve data reuse. Enterprises spend vast amounts of resources manipulating their data into usable assets. For this reason, a major consideration when creating your data is the importance of storing it properly from the beginning. If successful, the remaining steps in the process for creating a CMS should come much easier. Some industries have requirements that require the data be stored in its reproducible form. This means if a presentation has been placed on the data before it was published, not only would the data have to be kept for historical reasons but there would also have to be a way to reconstruct the document, unless the final document is stored as well. This could cost considerably more resources to store the same data that has been manipulated by several different presentations. Presentations can be defined as the final look of the data or content. This finalized and presented content may be of various mediums, such as a website, a document for printing, or perhaps an email to all of your current customers. It does not matter what the final product will be; each presentation can be completely different. The fact is the same data can be used many times, in many time spans, and in many data formats. Figure 1-2 shows the concept of separating data from presentation.
Data Capture Web Content Presentation Templates Content Generation
Raw Data
Document
Figure 1-2. Separation of data and presentation In Figure 1-2, data is entered into a data capture screen. This data is then stored in eXtensible Markup Language (XML) format in the raw data store or repository. Predefined presentation templates are then used to merge with this raw data to create the final generated output formats, one for the Web and one for print.
11
6110_Ch01_FINAL
12
7/2/06
1:01 PM
Page 12
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
To illustrate this concept further, refer to the sample XML data file in Figure 1-3.
Figure 1-3. Famous quotes XML data file Even if you do not understand XML at this point, bear with us for a minute. We’ll discuss XML in detail in Chapter 8, but for now it is not important to understand XML but rather to examine the general context of this file. This XML data file contains groups of famous movie quotes; each stored quote contains the movie title (annotated by a tag), the text of the actual quote (annotated by a tag), and the character who spoke it (annotated by a tag). Important to note is that this file contains only data. A web browser does not know how to display a quote, but we will tell the browser how to display this data by applying “presentation” templates to it. By separating the data, in this example the movie quotes, from the presentation, we can then generate and present that data in multiple formats. Figure 1-4 shows an example of this data presented with an HTML table. In Figure 1-4 the XML data, containing famous movie quotes, has been transformed via a template into an HTML file in table format. But the same XML data can be presented in any number of formats, depending on the number of presentation templates applied to it. Figure 1-5 shows the same data, presented in an HTML list format. By keeping the data and presentation layers separate, data can easily be presented in multiple formats. If the data was coupled or intermingled with the presentation, the arduous task would then be to strip all the data from its current presentation and insert that data into another differently formatted presentation. This would be nonefficient, error prone, and expensive.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 13
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Figure 1-4. XML data movie quotes file presented in an HTML table
Figure 1-5. XML data movie quotes file presented in HTML list format
13
6110_Ch01_FINAL
14
7/2/06
1:01 PM
Page 14
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Presentations can be layered on top of each other. A good example of this is in an organization’s corporate website. In a corporate site, a style guide provides the overall look and feel, and it can specify the various presentation templates of how everything fits together. Within an organization’s content, several types of pages usually exist. There are frequently asked questions (FAQ), site maps, and general content. Instead of creating several pages of content containing redundant information, one can create content sections. Defined content sections within the overall presentation can allow one to build smaller pieces that will fit within the overall structure. This is often referred to as synergy, where the sum of the pieces as a whole is greater than the sum of those individual parts.
■Note Generating content is the process of taking the data record, usually in ASCII (XML or text) format and combining it with the presentation template to “generate” the desired end product. The presentation template’s responsibility is the overall look of the final product. The generated content is usually constructed through the combination of the two structures. The first would be the presentation template, and the second would be the XML data record. In a CMS, the generation process or generation engine may be a proprietary application, or it may be based on open standards.
Introducing Metadata Metadata is data that describes other data. Picture a book for a second. Who wrote the book? How many pages does the book have? How much does the book weigh? What color is the book? How many chapters does the book have? All of this information describes the book and is known as metadata. Metadata is collected so that content can be categorized and searched. To illustrate this point, imagine the following scenario: You have just started a job at a large hospital, and a doctor would like for you to retrieve Mr. Smith’s medical chart from the records department. Upon entering the records department, you are inundated with a plethora of shelves filled with file folders, any of which could be Mr. Smith’s. Now, you could wander aimlessly around each shelf flipping through charts until you get lucky enough to stumble upon the correct chart, or you could use the records department lookup terminal to point you to the exact location. When you type Mr. Smith’s name into the terminal, a search is executed on all the metadata collected and indexed for each patient. Any patient who has a match with the search terms you entered is returned to you as a search result, probably with directions on how to locate the patient’s physical chart. Mr. Smith is returned as a search result, and you are able to quickly locate his chart. Let’s dig a little deeper—using this example, you can see how the following types of information would be useful as metadata for each patient: • Social Security number (SSN) • Patient last name • Patient first name • Date of birth • Primary care physician • Phone number • Gender
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 15
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
For the example, the SSN, patient last name, patient first name, and so on, are effective pieces of metadata. However, you must use the most appropriate metadata for the type of content used in your organization. Although it may make sense to gather the SSN as metadata for a person, it definitely does not make any sense when your content is of type Book or any other content that does not have an SSN. Discovering and collecting appropriate and useful metadata is one of the difficult steps that must be performed when implementing a CMS. To assist you, and the team you are working with, on this grueling task, you must ask five questions regarding each type of content in the proposed system: • What data describes this content type? • If I were looking for this content and I wanted to understand this content type, what information would I want to know? • What about this data makes it unique among other pieces of content? • What about this data makes it similar with other pieces of content? • Is there any CMS feature or downstream function that must be done to the content and that relies on certain metadata for this purpose? For example, you may have archival requirements where it is necessary to have metadata such as archival date, published date, and expiration date. These dates would be set at content creation and content publish time so that the CMS application could then archive, delete, and so on, this content at the appropriate date. Finally, always work with the content owners during this process, because they will know the best way to describe their data and will be instrumental in the metadata-uncovering process. To illustrate the concept of metadata further, Figure 1-6 shows the metadata stored for the file Ibm.html.
Company Research — ibm.html — Microsoft.html — AMD.html — Intel.html — Cisco.html — Sun.html
IBM, technology, research 10/12/2009 Eternal Content 10/05/2009 Brian McNeel Research
Figure 1-6. Metadata stored for the Ibm.html file Imagine for a moment that within a website directory structure there is a directory named Company Research. This Company Research directory stores several pages of content: Ibm.html, Microsoft.html, AMD.html, and so on. Each respective page of content possesses its own associated metadata. From the IBM example, it is evident that keywords, expiration, content type, published date, author, and department are stored for each page of HTML content.
15
6110_Ch01_FINAL
16
7/2/06
1:01 PM
Page 16
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
No matter what type of metadata is captured for your content, remember that time spent analyzing and capturing the appropriate metadata for your content is time well spent. Keep in mind that discovering halfway through the implementation process that you should have been keeping another piece of metadata can be frustrating and costly.
Using Templates Templates within a CMS allow content to be created once and used multiple times in a variety of formats. You may need to display information for printing, information for a website, or information for wireless devices. You can accomplish all this via templates. Using one XML data file and three templates, you could generate PDF files for printing, Extensible HTML (XHTML) files for web publishing, and WML files for wireless devices such as cellular telephones or PDAs. You can create templates in many ways depending on the CMS implemented. Typically in a CMS two main types of templates exist. The first type is the presentation template, and the second type is the data template. The presentation templates generate (believe it or not) presentations, or the look and feel, of the content. The second type collects the data that will be stored in XML format and be used as the substance, or the meat and potatoes, so to speak, of the content itself. Figure 1-7 shows an example of what a typical presentation template configuration could look like. In this example, several sections are created separately and then assembled together as one page. The page includes header template and footer template sections so that the entire generated web page could have the same header and footer. The navigation section could be generated, and the promotions section could also be generated based on what content is being viewed on this page in the main content area of the page. This HTML page (comprised of different template sections) could be generated in advance to help reduce the strain of creating it each time it is requested. This is usually determined based on how often the content is updated, the processing power required to generate the pages, and what architecture is being used to serve the content.
Header
Promotion
Navigation
Main Content Area
Footer Figure 1-7. A typical presentation layout
Completed and generated page composed of multiple template sections with each section potentially containing separate pieces of content
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 17
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
The data template ensures that the data that is collected is the same in each record or XML file and could be as simple as an HTML form. By using data templates, it is easier to maintain data integrity because the data-entry tasks can be rigidly controlled and validated at the time of data entry. Without this integrity, the template system does not work. In Figure 1-8 and Figure 1-9, we have taken the template concept and applied it toward an actual website. In Figure 1-9, the previously described template components are outlined to show how a generated page could be built based upon discrete template sections.
Figure 1-8. Generated and completed site
Figure 1-9. Generated and completed site with content sections outlined
17
6110_Ch01_FINAL
18
7/2/06
1:01 PM
Page 18
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Understanding Workflows For almost anything you have to do in life that requires a predefined process, you must follow certain steps. In the CMS world, this process of steps is known as a workflow. More specifically, the process for which a document goes through during development can be turned into a series of steps that is a workflow. The workflow should model your business process as closely as possible. However, do not neglect the need to examine your existing process and make the business process more efficient before implementing that process as a workflow. A workflow allows the CMS to enforce steps that could otherwise be averted. You can perform steps such as mandatory validations as needed. Workflow ensures that each step is fulfilled before the content is released to the public. Figure 1-10 illustrates a simple workflow process. In this scenario, the content contributor submits content to the content approver. The approver can approve the content, sending it to the legal department, or they can reject it and return it to the contributor. If the content is rejected, then the content contributor will receive the content with instructions on what needs to be changed. After alleviating the problems, the content is then resubmitted for approval. Once the content approver has approved the content, the content goes to the legal department for review. If the content is rejected at this level, it will be returned to the original contributor for modifications, and the process repeats until the legal department gives the final approval needed to publish the content. The content is generated and distributed in two ways. The first is generated in a web-ready format in the form of HTML, PDF, or some other web format. The second is generated in a format that is optimized for print. As this illustrates, the system, even at a simple level, can be effective in contributing to the document development process. Legal Rejection
Submit Content Contributor
Approved
Approved Content Approver
Legal Review
Generation
Internet
Generation
Publication
Approved
Content Rejection
Figure 1-10. Simple workflow process The logical conclusion of a workflow is the deployment of content. The deployment of the content deals with the path that the content takes through the physical and networking infrastructure of an organization. In Figure 1-11, the workflow would execute in the content server, and upon workflow completion, the three types of presentation would have been generated and moved to the deployment server. The deployment server would have access through the corporate firewall to the web server. A process running on the deployment server would be responsible for placing the content into the appropriate location on the web server. The web server is then responsible for serving the content to its various requesting clients.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 19
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Content
Editor
Legal Approval
Internet
PDA
Content Server Deployment Server
Firewall
WWW Server
PC
Publication
Figure 1-11. Deployment path of content
Introducing User Groups and Permissions User groups and permissions are common to all technology-based systems, but if implemented properly in a CMS, they become one of the most powerful and important parts of the system. Any CMS implementation has many types of users; however, most user types can be divided into one of the following roles or groups: Initiator role: This role can start a workflow for content creation, content deletion, or one of any number of defined workflows in a CMS. Content contributor role: This role is responsible for creating and modifying content. Reviewers/approvers role: This role is responsible for reviewing, approving, and rejecting content. Hybrid role: This role’s responsibilities can combine any of the previous three job roles. This role may also have distinct capabilities that are not present in other roles such as the ability to bypass workflows, approve processes, and submit multiple changes. Typically this role is reserved for administrators and super or power users. Multiple interfaces into the CMS may be constructed and utilized based on the user type. There are many reasons to implement multiple user interfaces: Training: Because you are using multiple interfaces, the interface can be kept clear of extra functions and features, which may confuse the trainee and may not be used by the specific role for which they are being trained. Security: By eliminating additional functionality that is outside the security boundaries of this role, you can avoid many mishaps. Why allow access to an interface that, for example, allows the user to delete content when the user role should not even have that ability? Keep in mind that not all security breaches are purposeful, but many are caused accidentally. The interface should remove user temptation to find out what “this option does.”
19
6110_Ch01_FINAL
20
7/2/06
1:01 PM
Page 20
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
The bottom line is that if the role should not perform certain actions in the system, then do not present them with that option. Branding: By having separate interfaces—between your clients, for example—personalization options can present a corporate brand image in each interface.
■Note If you decide to implement multiple interfaces, you will be need to maintain both interfaces when adding new functionality. Also, individuals with multiple roles may be forced to log in to different interfaces depending upon what action they are performing.
Introducing Repositories Repositories, or the data stores in a CMS, usually hold two types of data: binary data and ASCII data. ASCII data often comes in the form of XML. XML data, coupled with presentation files, will actually compose the generated content. XML data within the CMS is usually grouped into logical containers to facilitate the creation of the content and the generation of that content. Often times this logical structure is ordered in much the same way as a typical directory structure. The binary data can take many forms and is typically proprietary in nature, such as PDF files, images, audio files, and video files. A quality CMS should not restrict you to one repository but should allow open and flexible integration with many different architectures and systems. Some possible integration points may include directory structures whether Unix or Windows based, relational database management systems (RDBMSs), Lightweight Directory Access Protocol (LDAP), and web services. The following sections cover some questions to consider and answer when constructing your repository.
Do You Understand Your Data? Do you understand how your data will be used in the enterprise? How will data be packaged, and in what formats must that data be presented? Understanding the purpose of your data, which will become your content, is critical to a successful CMS implementation. We recommend meeting not only with your database administrators but also with your business owners. Do you understand what each field in your data repository (database or file system) represents? You must answer all these questions in order for you to make sound decisions while implementing a CMS.
How Much Data Will Be Stored? What is the current size of your data that will be used to form content for your company? By knowing the current size of your data, you can derive requirements for storing that data. You must become intimately familiar with the current size of your data because this will impact the retrieval, archival, and retention requirements of your CMS.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 21
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
What Is the Anticipated Future Growth of Your Data? You do not want to have your company potentially spend millions of dollars on a system that is not scalable for the immediate and midterm future. The best way to evaluate your future growth needs is to first look at the amount of storage your current data requires. Next look at any anticipated new sites or new data stores that will be migrated to your CMS once the system is in place. Finally, you should develop a multiplier to apply to the size of your current data plus any anticipated new site or new data stores. Your multiplier should take into account many factors, including acquired data from internal sales and new business development. Your final data size should include the ability to scale for the next three to five years. Luckily, Interwoven has developed some metrics to assist you with your sizing requirements. Let’s look at an example to illustrate this point. Assume for a second that your current data storage is 500MB and the planned new site migration and new data store integration size adds up to 1GB in size. Interwoven states that the TeamSite CMS requires more storage space for TeamSite metadata, space to store multiple versions of the site content, and space to allow for future site growth. So taking all of this into consideration, you have 500MB + 1GB, which equals 1.5GB of total storage size. Remember the multiplier number—Interwoven recommends multiplying by 10 to 20 times the current storage size to allow for future growth. We recommend taking the higher number. Based on all of these factors in the example, the estimated storage requirements for the data in the example should be approximately 30GB of storage space.
How Complex Are the Data Relationships? Does your data cross multiple repositories? What is the average size of each content component? Do you have numerous many-to-many relationships? A many-to-many relationship is one where data entities have many possible relationships to each other. An example of this is if you were modeling an article of clothing such as different models of pants. Each model of pants could come in different sizes, and each available size could relate to multiple models or types of pants.
What Are Your Data Retention Requirements? This is really a question of how much money you are prepared to spend to support your requirements. How much data storage are you prepared to purchase? Do you know how long that data must be maintained for historical purposes? Do you plan on purchasing any additional storage space to provide data redundancy for your system? You must answer all these questions to ensure a successful implementation.
What Are the Data Retrieval Requirements of Your System? How quickly must content be served up or presented from your repository to the calling program? Does the data have to be repackaged or transformed into a different format? Do style sheets have to be applied based on the system requesting the data? For example, one repository can serve multiple systems: one legacy system and a web-based system. The legacy system requires its data to be in a predefined EBCDIC 80-column format, while the web-based system requires its data to be presented in a base64-encoded XML format.
21
6110_Ch01_FINAL
22
7/2/06
1:01 PM
Page 22
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Knowing What to Look for in a CMS The following information will allow you to draw from our considerable content management experience and provide you with CMS best-practice recommendations for the attributes that a quality CMS should possess. This information is extremely valuable when you or your company is evaluating a CMS for purchase.
Open and Flexible Architecture A quality CMS should not force you to use proprietary technologies. It should be extensible with other technologies and highly scalable within an organization. The actual system should be modular, allowing the purchase and installation of pieces on an as-needed basis. If you need only workflow, then you can purchase only workflow functionality; if you need templating, then you need only purchase templating functionality.
Support for Multiple Data Formats A quality CMS should allow integration with a variety of data sources, RDBMSs, XML database management systems (XDBMSs), and LDAP, for instance, and should allow storage of ASCII and binary data.
GETTING TO KNOW MORE TERMS In an RDBMS, a management program allows you to create, update, and administer a relational database. A relational database is a structure where data is organized in formally described tables that allow you to retrieve and use that data without having to reorganize the data. In a relational database, the data is arranged in a way that maintains the data’s relationship to the other data in the database. Some industry-known relational databases include Oracle, Microsoft SQL Server, and IBM DB2. In an XDBMS, the data is arranged in a hierarchical XML format. This database format is extremely document-centric and allows extremely fast data retrieval but for specific purposes. Arranging the data in an XML format allows you to easily prune that data for specific component retrieval. For example, in the publishing environment, data can be placed into the XML database in its raw XML format. Then when you want to retrieve a specific chapter, a specific section, or even a specific paragraph, you can prune the data quickly, and only the section of content requested will be served. Typically, XDBMSs use the XML-based XQuery language to retrieve their data; some industry-known XML databases include Mark Logic Content Interaction Server, Xyleme Zone Server, and Virtuoso. LDAP is based on the X.500 standard. In LDAP the data is represented as objects. Each object can be queried based on supplied attributes of that object. This allows objects to be queried without the requesting program knowing the location of that object.
Integration Capabilities with Existing Systems A quality CMS should allow integration with many disparate systems, including repositories, legacy systems, and platforms.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 23
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Separation of Data and Presentation This is a fundamental quality of a CMS; to facilitate the reuse of content and deployment in multiple presentation formats, data and presentation must be kept separate.
Flexible and Configurable Workflow Engine A CMS should allow multiple workflows to be constructed, which will have the ability to replicate 90 percent of existing business processes. You will want a CMS that can house multiple workflows for multiple user groups, with multiple tasks or steps in each. Workflows must have the ability to move content through a delivery chain that is not part of the CMS; in other words, a quality workflow should have the ability to make changes in systems external to the CMS.
Effective Use of Standards Does your CMS support XML as defined by the World Wide Web Consortium? Does it support PDFs, XHTML, XSL Transformations (XSLT), and connections to standard RDBMSs? The effective use of standards by a CMS platform will ensure that your organization is not locked into a proprietary format.
Content Authoring Toolset This is a powerful factor in selecting a CMS. Does the CMS allow you to import non-templatecreated content? This would include Word files, PDF files, and Dreamweaver templates. The CMS should also inherently possess its own method of entering content, without the dependency on external or third-party tools, and may include such noteworthy features as spell check and autocomplete.
Multiple-User Authoring Environment The CMS you choose must allow team-based collaboration of the content. If you have enough content to warrant a CMS, then you must be aware of the need to have multiple content authors, content contributors, and reviewers. The system must allow multiple users from multiple locations to manage, review, approve, and create content. A CMS is an enterprise application.
Metadata Creation A CMS should allow for the seamless integration of metadata with the content it is defining. The metadata should not be limited in the data format that can be used. The system should be able to store binary metadata as well as ASCII metadata. Additionally, a quality CMS should be able to automatically extract some metadata from content files, such as when you need to extract PDF properties for the author, title, and date of creation from an imported PDF file.
Easy to Use This cannot be stressed enough: a quality CMS should not force you to use highly technical resources to manage your content; with minimal training (which is therefore much less expensive), the administrative staff should have few problems navigating and using the CMS.
23
6110_Ch01_FINAL
24
7/2/06
1:01 PM
Page 24
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Version Management/Archival Capabilities and Rollback Capabilities Version management can help eliminate risk from content authoring. The ability to roll back to any version gives the author the chance to easily correct any inadvertent mistakes such as forgetting to change an image on a certain web page. Also, the ability to archive those versions and easily restore each specific version may be essential for historical and legal requirements. Most good CMS have versioning and rollback functionality built into the repository. We will go into more detail later about this when we address the repository in Chapter 7.
Should Meet Your Specific Content Needs This may be the most important consideration when choosing a CMS. If the system does not match your content, you will be forced to make customizations to the CMS to ensure it fits your organizational needs. Some level of customization may be (will most likely be) required; however, choosing an appropriate CMS from the beginning should always minimize this.
Should Have a Strong Customer Base and Stable Performance Track Record Does the CMS software vendor have several clients similar to your organization, with a proven history of successfully meeting their content needs? Will the vendor allow you to contact some of their current clients for feedback regarding their product? A quality CMS vendor should be able to provide you with a résumé of sorts of its accomplishments. The vendor should participate in trade shows, industry special-interest groups, and best-practice organizations. Investigate all the product offerings to find the best match for your organization’s needs and future needs. See whether other CMS vendors are attempting to play catch-up with this CMS vendor. Look for awards and achievements relating to the CMS product. Also look for quality support and a good history of deploying this CMS solution to various clients. You do not want to purchase a CMS from a fly-by-night company that will leave you with a solution that cannot be upgraded, that will not have service packs available, and that will not be able to meet a support contract.
Should Match Your Budget and Future Growth Does this CMS have the capability to add storage, are patches and service packs released regularly, how scalable is this CMS architecture, and does this CMS allow usage by your European offices? A CMS is an expensive investment for an organization. You may be rolling out the system only to your United States offices now, but what about in five years when you expand into the European market? How much will your content grow in five years when all departments in your organization are using it? Balancing your current technology budget with your future forecasted growth is a difficult decision but is one you must take seriously. The CMS you choose must allow for expansion.
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 25
CHAPTER 1 ■ WHAT IS CONTENT MANAGEMENT?
Investigate the Features and Functionality That This CMS Cannot Provide for Your Organization If the vendor cannot identify some of its product’s weaknesses when addressing your needs, then perhaps the vendor does not fully understand your content needs. CMS is not a magic bullet; do not be too wooed by slick marketing and salespeople hype to forget that the CMS will not do some tasks. Specifically, will you need to customize the product, hire specialized and technical support and development staff, or rely on third-party applications for your content? Is this CMS vendor aware of its product’s weaknesses, and does the vendor have a plan, experience, and suggestions on how to defeat those weaknesses? Your organizational needs and budget will determine the type of CMS it needs; however, most companies will benefit from some level of content management. Implemented properly, a CMS is one of the smartest purchases that an organization can make. Using this chapter as a guide, you will be well informed to make recommendations on that purchase and assist with the implementation.
Summary In this chapter, we discussed content and what content is. We then described content management and let you know the business need for content management. You learned how CMSs separate the data and the presentation to ensure content reuse. Next we discussed metadata and how critical metadata is for content intelligence. We then discussed templates that allow for content creation and generation and for workflows that are system-based models of your business process for the content creation life cycle. We talked about user groups and security permissions in a CMS and repositories to store your content and content generation. Finally, you learned about the most critical attributes of an enterprise-class CMS. In the next chapter, we will introduce you to the CMS case study that the remainder of this book will be based on, and we will jump into the first task, namely, defining the scope of the case study for FiCorp.
25
6110_Ch01_FINAL
7/2/06
1:01 PM
Page 26
6110_Ch02_FINAL
7/2/06
1:05 PM
CHAPTER
Page 27
2
■■■
Defining the CMS Project Scope C
hapter 1 introduced you to the world of content management. We told you what content management was and gave you a high-level understanding of the key requirements that content management addresses. This chapter will teach you how to define the project scope for a content management project. This chapter will delve into common issues you must consider when defining the scope of your content management project. To further illustrate the concepts in this chapter, we will be using FiCorp to show you how the activities in this chapter map to a real-world scenario.
Defining Scope The success of any project ultimately hinges upon a well-defined project scope, which can be defined as the expectations of what you (the CMS implementer or project team member) will deliver in accordance to the expectations of the project stakeholders and customers. Many people tend to misinterpret the project scope definition as being solely related to time and cost; however, by more closely examining project scope, you will find that it really involves developing a complete understanding of what is and is not included in the project. Once you have an understanding of the scope, you can calculate the time and schedule. You should also continually revisit a content management solution (one that makes use of a CMS as the underlying infrastructure) to accommodate changes in business needs. Business needs change because of external market factors. Any particular project or phase of a project should be scoped properly, but only within the context of the solution’s ever-changing life cycle. You must think of content management as an ongoing process. When thinking in terms of project scope, the most important advice we can give to you is to assume nothing! Many projects have failed because a project manager assumed all involved parties knew what a requirement meant, only to find out later that the assumptions were incorrect and that the customer actually had different expectations. These factors should all be combined into the evolving vision artifact for the project. We recommend using the Word template for the vision artifact that comes with the RUP product; you can then customize it to fit your specific needs. Within the context of a CMS project, you need to consider some important factors in regard to project scope.
Security Factors Security factors include address authentication, encryption requirements, user security and permissions, and groups. Do you have sign-on requirements? Should the sign-on process to your CMS also authenticate your user to another system? Your CMS implementation will have
27
6110_Ch02_FINAL
28
7/2/06
1:05 PM
Page 28
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
security requirements that will cover the system login and the possible security within the CMS structure itself. Security within the CMS structure will prevent content authors from playing with content that does not belong to them. Security is one of the most important factors of a CMS, so make certain it is defined appropriately.
User Interface Factors User interface factors include client types, portal issues, thick or thin client issues, and the actual customizations of the chosen user interface. How will content authors access the CMS? How much functionality will be exposed to content authors? What type of interface should be built? With an ECMS such as Interwoven TeamSite, you have many interface options available to you. You can use ContentCenter Standard, ContentCenter Professional, or the Casual Contributor interface, or you can build your own. Chapter 15 covers more about the available interfaces. For now, you should just document how the stakeholders envision their users working with the system.
Authoring or Content Creation Factors How will content be created in the system? Will content be imported? What formats will be accepted, and what will any output format be? Will you need to integrate the CMS with external or third-party content creation tools? Will you use only the included HTML editing tool, and will you customize this tool? We have been involved with projects where the WYSIWYG editing tool was customized to pull digital assets such as images into the content page or pages. You should also strive to get a mental picture of who will be contributing content in the CMS. This type of analysis will directly feed the use cases developed in Chapter 4.
Migration Factors Migration factors include application and content migration. Do any content-aware applications exist that must be migrated to the CMS, and if so, which components of those applications will be managed by the CMS? Interwoven’s flexible architecture allows you to integrate with many applications. Applications that depend on content for their functionality are known as content-aware applications. With TeamSite, you can define output formats that match the source format for the application. This way, TeamSite can send data directly to the application. Using a TeamSite workflow, you can even trigger the external application to begin a read process for that data. This is the typical maturation of TeamSite implementations from serving static content only to then delivering dynamic content and to supplying content to content-aware applications. Once the CMS is implemented, you will want to begin moving any non-CMShosted sites into the CMS. This will enable you to fully recognize the return on investment of implementing the CMS. In fact, part of your implementation should be identifying one or two of these sites that will be implemented as pilots in the completed CMS.
Content Types and Content Intelligence Factors Content types and content intelligence include the types of content and the metadata that will be stored for each type of content. Do you need an enterprise-level content intelligence application? Our presumption is that you will indeed need this. One such application is the Interwoven MetaTagger product. (We’ll cover MetaTagger in detail in later chapters, beginning with Chapter 3.) Getting the content out to content consumers is not enough. Consumers must
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 29
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
easily be able to retrieve the content they want, when they need it. Content intelligence adds value to the content by enriching the content with a predefined vocabulary. We recommend you research this important factor. The Subject Matter Experts (SMEs) at your company should be able to assist you with this research. These experts could be team members from any team that supports any external application that will rely on your content or on the content from the CMS. As another option, you can hire a professional indexer or librarian to assist you. Discover what external systems will rely on this data and what format of data these systems will expect. Make sure you include domain experts in the discovery and definition of this factor.
Search Factors How will the content be indexed, and where will the visibility to this index live? What types of search will be supported…natural language, fuzzy, full-text, keyword? You must consider this information to get a good understanding of the scope of your CMS implementation.
Workflow/Business Process Factors These factors include the current content life cycle and business process. Translated into the CMS, this business process will be re-created as one-to-many workflows. What is the review and existing production process, who is notified, and at what point in the process are they notified? Who must approve work, and who is responsible for initiating a work request? Does this process ever change, and do exceptions exist? If so, what are these exceptions? The “Investment or Market Research Content Review Process” section will discuss review processes.
Publishing/Delivery Factors Where will the content be delivered, and in what time frame will it be delivered? What type of generated output must be supported…WML, HTML, JavaServer Pages (JSP), Active Server Pages (ASP), and PDF, for instance? Once the content is approved, how quickly must that content be deployed from the CMS to the website? Many times content must be deployed across multiple network segments and to multiple web servers. Try to identify as many delivery or publishing requirements as possible, because this will save you a great deal of time later. The more information you gather during the scoping exercise, the better and more informed your decisions will be later in the project.
Repository Factors Repository factors address the amount of data to be stored. What is the average size of each content type, and how many types are present? Do you want to allow space for assets or only content? What is the projected content growth over the next three years? Repository factors will vary from installation to installation and depend entirely on the unique requirements for each company. Once you find all the content that must be implemented, calculate the size of that content and any deltas that may also be involved.
System Administration Factors System administration factors address the administration of the system. How are changes implemented in the system? Who is responsible for starting and stopping the system? Who does the backup and restore? Who installs new patches and performs other administrative type of work?
29
6110_Ch02_FINAL
30
7/2/06
1:05 PM
Page 30
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Reporting Factors These factors include the internal and external reports of the system. Is there a need for customized or ad hoc reporting? Where will the reports be available from? What data must be included in the reports, and what system events must be captured? We will talk more about reporting in Chapter 8. For most customers, the Interwoven TeamSite add-on product ReportCenter provides more than adequate reporting functionality. However, you should define or at least identify any reporting that is required once the system is in place.
Base CMS Factors Base factors include versioning, check-in/checkout, file locking, content preview (both static and dynamic), application server integration, and much more. The goal with this factor is to determine what the project stakeholders envision a CMS to provide. This factor relates to educating the stakeholders while they inform you of the functions they need.
Training Factors Training factors include the training requirements for users of the system once the system goes live. Questions to answer are as follows: Who will be trained (this could be technical and nontechnical people)? Will specialized training be needed depending on the job role? What type of training shall be provided? Will you provide classroom training or web-based training? How many users must be trained? Keep in mind that training will encompass both your end users and the people who will administer the system.
Delivery/Target Launch Date When will the product be live? What is the deadline for implementation, and is there any flexibility in this date? If the date is unrealistic, are the project stakeholders or IT management willing to hire additional resources?
■Note Keep in mind that you will not have a complete view or understanding of every one of these factors at this stage in the project; however, if you keep these at the level of features or high-level components, then you will be well on your way to project success.
Defining the Functionality and Deliverables You can determine project scope in many ways; however, we recommend defining project scope by defining functionality and deliverables. To define the scope of a CMS engagement, you must begin by defining and reaching consensus on the final outcome of the project. You must also document all project assumptions and then maintain that list for possible changes. This finished scope will then be documented in a vision artifact for historical purposes. Once the Vision artifact is finalized, it is imperative that you receive project stakeholder sign-off on the document.
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 31
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
■Note A vision document, or vision artifact, is an artifact in the RUP where you recount the vision of the project. In the RUP, an artifact can be any deliverable that the project produces, such as formally defined artifacts that include the vision artifact, use case models, iteration plans, and so forth, and informally defined artifacts such as code modules. The vision artifact is a document template that contains the purpose and scope of the project as well as the business opportunities, risks, and constraints.
To define the project scope for a CMS project, you must have a scoping workshop. Out of this workshop you should produce a project description. You should also be able to produce a list of potential project risks and assumptions as well as the major features that are required by the project to be implemented. All this information will feed your vision document. You can obtain the majority of this information from meetings with stakeholders and in a scoping workshop, which must be held to ensure that an accurate vision is captured and complete. The scoping workshop should be from a half day to a full day in length but can often stretch into a second day. The scoping workshop must contain all project stakeholders, or a designee from each, and one or more SMEs in the corporation’s content. These individuals will constitute the invitees from the business side. You should also make certain that a meeting facilitator is present; a business analyst or project manager usually fills this position. You should invite one or more scribes to the meeting and, if possible, record the audio from the meeting for later transcription. These individuals combined with the earlier business-side representatives will be the complete attendee list. The attendees and their responsibilities are follows: Project stakeholders: These individuals will be responsible for dictating the purpose of the project and for making any decisions that will drive the remainder of the project. One or two SMEs: These individuals will be responsible for answering any content-centric questions or questions that require in-depth product knowledge. One meeting facilitator: This individual will be responsible for keeping the meeting on track and ensuring that all meeting objectives are met. One or two scribes: These people will be accountable for capturing all statements and meeting minutes. The scribes will be responsible for taking the minutes of the meeting. If everyone has a clear understanding of the project objective, it will make the job of creating the project description easy; however, this is usually not the case. We recommend that all stakeholders write down their project description independently and then submit those to the meeting facilitator. The meeting facilitator will then post each one in the workshop room. Once posted, all common points can be immediately included in the finalized project description. Next you can tackle the divergent points. Ask each stakeholder why the divergent points are valid, and ask the author of the point why this is a valid point. When you reach a consensus, then update the finalized project description. Repeat this process until you have added all agreed-upon points to the finalized project description. Finally, reorganize, if needed, the finalized project description so that it is written in a clear and logical fashion.
31
6110_Ch02_FINAL
32
7/2/06
1:05 PM
Page 32
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Project Description If someone asked you to summarize the entire project in just a few sentences, what would you write as the project description? Project descriptions can range from one paragraph to a few pages in length. The project description is essential for setting a common objective among all project members and should be the first topic discussed during your scoping workshop. The project description is the “what” that must be accomplished upon completing your project; keep away from the “how” at this point in the process. To achieve consensus on the project description, it is imperative that the decision makers are in the scoping workshop. We have seen many hours wasted on producing a problem description only to have to start all over again because the right decision maker was not in the room.
Project Assumptions Assumptions are criteria and project decisions that you make with little or no supporting evidence. Assumptions are needed to ensure the success of the project. Incorrect assumptions can be very costly to a project. All project assumptions should be documented and later verified for correctness if possible. An example assumption could be something like “A CMS solution that matches our requirements for this project can be commercially purchased.”
Project Features Project features are high-level requirements of what the system can do. Features are not as detailed as project requirements, and the trend is for a feature to map to multiple detailed requirements. At the Inception phase of the project, you will typically gather (during the requirements workshops) a combination of features and detailed requirements. At this point in the project, it is important only to write all information down; you can categorize that information later.
■Note The RUP is an iterative and incremental development methodology comprised of four main phases. Each phase is arranged around a set of disciplines. These disciplines have defined activities and include certain topics. Some disciplines are the Business Modeling, Requirements, and Analysis disciplines. The phases are the Inception phase, the Elaboration phase, the Construction phase, and the Transition phase. Each phase has certain objectives and deliverables or artifacts.
Project Risks Project risks are factors that can cause harm to your project. Basically, risks are everything and anything that can go wrong on your project. The goal is to document as many risks as possible and then develop contingency plans to mitigate those risks. Consider several areas when assessing risks such as the availability of hardware, changes in technology or standards, the availability of resources, a lack of funding, and any corporate policy changes or considerations.
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 33
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Delivery Date This date will be set by the stakeholders of the project and will include many factors. Some examples of possible factors may include market drivers, key business initiatives, and corporate strategic initiatives. This date will most likely be loosely defined during the Inception phase but will quickly become an actual calendar date, during later phases.
Introducing FiCorp For this book’s case study, our intention is to provide you with enough information and detail to successfully implement your own CMS but not overwhelm you with details that will make you a domain expert regarding the requirements of a fictitious corporation. That being said, we will show you examples of artifacts that must be created for your project, but we will not bore you with complete artifact examples. Remember, when working through the RUP—or any methodology, for that matter—it’s worth maintaining only those artifacts that will truly add value to your project. Unnecessary documentation and planning will only add unnecessary overhead to your project. For this case study, we will pretend that you have been asked to implement a CMS for a fictitious financial services firm named FiCorp. We will begin by introducing you to the company and telling you about its existing publishing process and infrastructure. The remainder of the book will take you through using Interwoven TeamSite to implement a robust CMS that meets this company’s needs.
Learning All About FiCorp FiCorp is a financial services firm that specializes in banking, investment, and wealth-building strategies for consumers and different-sized businesses. FiCorp has been in business for more than 15 years and has added to its product offerings over that time. Over the same time, the content for the main Internet website and customer extranet have grown to where the company now has more than 4,000 pages of content. As the business and content has grown, so has the administrative nightmare of managing that content. The company currently publishes both HTML and JSP pages to its corporate site. Some of the content pages are purely static HTML, and some are JSP pages that, once published, pull some content from an Oracle database. The crux of the problem lies in FiCorp’s workflow arrangement. All updates are performed either by way of a request to the IT staff for content modifications or through a simple publishing tool that has been made accessible to the respective content authors in each topic area. The updates done via FiCorp’s publishing tool do not follow a system-based approval process. The updates performed by the IT staff can sometimes take from two days to four weeks depending on IT resource availability, which is an unacceptably slow process in the financial services industry. The notifications of content updates that must be sent out are difficult to manage, and sometimes they’re forgotten completely. Content deletions from the production website must be done by IT staff, and the business owners would like to have the ability to perform their own content deletes. To further complicate matters, the SEC has mandated that all content in the FiCorp site must be reproducible for a period of three years from the time it was last available for viewing on the site. Should the SEC ask to see it, FiCorp has 48 hours to reproduce the site as it looked on the requested date and time.
33
6110_Ch02_FINAL
34
7/2/06
1:05 PM
Page 34
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Figure 2-1 shows the FiCorp site structure, which is the site that will be implemented into the CMS.
Figure 2-1. FiCorp site structure The FiCorp site is broken into five main content sections: Personal, Small Business, Commercial, About Us, and My Account (an extranet section that allows the customers to log in to their own accounts). Each main content section contains its own subsection of content. Each section also includes its own left-column navigation. The left-column navigation is visible as the main navigation in the site map diagram (Figure 2-1). Each subsection of content contains several topics. Each topic then contains several pages of content. Different FiCorp SMEs are responsible for their own content sections and topics. FiCorp also has several application integration points within the site: • The Find a Location application, which is located in the header section of the website. This application allows a customer to find the nearest banking location by providing different input criteria, such as a ZIP code and a mileage radius.
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 35
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
• The Sign-On application, which is located in the header section of the website. This application allows customers to authenticate to FiCorp and retrieve the available services for that particular customer. • The Open an Account application, which is also located in the header section of the website. This application allows a customer or potential customer to open different types of accounts with FiCorp or create an account as a new customer. • Several other smaller applications, such as Contact Us and Search. FiCorp has content topics with several pages of content that need to be changed each day. Some examples of this content are FiCorp’s press release section and the home page for the site. IT staffers are unable to keep up with the demand to promote these content changes so they have built a simple publishing tool that the FiCorp content authors use to publish these changes. The publishing tool has several limitations. It has no built-in spell checker, so frequently typos appear in the content. Also, the publishing tool does not have the ability to accept many types of rich content; only a few formatting tags are allowed that control spacing, the bolding of text, the italicizing of text, line breaks, and image insertion. The publishing tool is maintained by the IT department and does not allow for or follow any type of review process. The review process is manually managed outside the publishing tool. All content authors are given access to the publishing tool but are individually responsible for managing their own content and ensuring that the correct approvals have been given for all content updates. FiCorp has several approval processes for different content sections of this site; these approval processes are based on the type of content being updated. Three of the approval processes are covered in the following sections. Some of these processes can be consolidated in order to make the review process less awkward to the user.
Investment or Market Research Content Review Process Investment/market research content can change rapidly and needs to be updated on a regular basis. Any content that follows this review process is for internal use only. In fact, “INTERNAL USE ONLY” is displayed on all content that has been through this process. If an analyst wants to give this information to a client, the content page must go through a legal review first. All market research and investment data must go through this process before it can be deployed to the corporate intranet site. Figure 2-2 shows the market research review process, which is initiated by a research analyst whose job it is to watch certain stocks or investments. The moment a trend is spotted in the market, a request for a content change has to occur. Once the research analyst prepares a recommendation—for example, a buy, sell, or hold—they must send it to a senior analyst for review. The senior analyst is typically responsible for reviewing recommendations and other content updates from several research analysts. After the senior analyst reviews the recommendation, they decide whether the content changes are approved. If the content modifications are not approved, then the content is returned to the research analyst for revision and must follow the approval process again. If the content is approved, the content goes to an editorial assistant. The editorial assistant is responsible for checking the grammar and spelling of the recommendation or content. After reviewing the content, the editorial assistant must then decide whether to approve or reject it. If the content modifications are not approved, then the content goes back to the research analyst for revision, and it must follow the approval process again. This process
35
6110_Ch02_FINAL
36
7/2/06
1:05 PM
Page 36
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
continues until the content is finally approved; then IT is notified and deploys the content to the test system and finally to the production website.
A market change
The research analyst sends the recommendation note to the senior analyst.
The research analyst compiles a recommendation note.
The senior analyst reviews the recommendation.
The senior analyst sends the recommendation back to the research analyst with rework comments.
The editorial assistant sends the recommendation back to the research analyst with rework comments.
No
No
Approve?
Approve?
Yes
The senior analyst sends the recommendation to the editorial assistant.
The editorial assistant reviews the recommendation.
Yes
The editorial assistant sends the recommendation to IT for update to the site.
The senior analyst performs a final review before the content is deployed to the site.
IT notifies the senior analyst for final review.
The editorial assistant notifies the research analyst and senior analyst of the approval.
IT deploys the update to the staging area/ preview area.
Figure 2-2. Market research review process
Legal Content Review Process This review process allows for content changes that might subject FiCorp to litigation such as payroll and tax content, the privacy policy, the terms and conditions page, and external research information to be displayed externally. All this content needs to have a legal review prior to the content being deployed to the Internet site. Figure 2-3 shows the legal content review process, which can be initiated by the corporate communications team, a topic SME, or a member of the legal research team. Once the process has been initiated, it is the responsibility of the content author to create or modify the content. When this is done, the content author sends the updated content to the topic review lead (TRL) who is in charge of that specific topic area. The TRL must then review the content. If the content
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 37
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
modifications are not approved, then the content goes back to the content author for revision and must follow the approval process again. Once the content is approved, the content must then be sent to the corporate communications team for review. The corporate communications team must now review the content to ensure compliance to corporate communications standards. If the content modifications are not approved, then the content is returned to the content author for revision and must follow the approval process again. If the content is approved, the content is sent to the legal research department for final review. If the content is not approved, the content is once again sent to the content author for rework. Finally, when the content has received legal approval, the content author places the content into the publishing tool and pushes the content to production. A corporate communications (CC) team member, a subject matter expert, or a legal researcher initiates a request for a content change.
The content author (CA) for the topic or content section creates/modifies the content.
The CC sends the completed content via an email to the Topic Review Lead (TRL).
The TRL sends the requested changes back to the CA for rework.
The CC team sends the content back to the CA for rework.
The TRL reviews the content.
No
No
Approve?
Approve?
Yes
The TRL sends the content to the CC team for review.
CC reviews the content.
Yes No
Approve?
The legal department reviews the content.
CC sends the content to the legal department for review.
Yes
The CA places the content into the publishing tool and deploys the content to the website.
The legal department notifies the CA that the content is approved via an email.
Figure 2-3. Legal content review process One exception to this process is for content that has been through the investment or market research content review process already: in that case, the corporate communications team is usually bypassed, and the content is sent straight to the legal review. This usually happens when a financial analyst requests permission to send this content to a client.
37
6110_Ch02_FINAL
38
7/2/06
1:05 PM
Page 38
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
General Content Review Process This review process allows for general content promotions such as marketing information (for example, promotional text boxes) and other general content. This process recommends a legal review before deployment, but this step can be skipped. This process is initiated by the specific departmental SME for that content subsection or topic, and the content has to be reviewed by the corporate communications team, prior to being published to the site. Figure 2-4 shows the general content review process, which is initiated by the departmental SME for the given topic or subsection of content. This process allows the promotion of all other content in the FiCorp site. This approval process allows the departmental SME to include or exclude the legal content review. The remainder of the process is similar to the legal content review process described previously. The departmental SME initiates a request for a content change via email to a CA.
The CA sends the completed work to the departmental SME for review.
The Content Author (CA) creates/modifies the content.
The departmental SME reviews the content change.
The departmental SME determines if legal review is needed.
Yes
Approve?
No
The departmental SME requests rework.
No
Approve?
CC reviews the content.
Yes
Need Legal?
No
The departmental SME sends the content to CC for review.
The departmental SME sends the content to CC for review.
Yes
CC sends the content to the legal department for review.
No
Approve?
Yes
No Approve?
The legal department reviews the content.
Yes
Figure 2-4. General content review process
The CA is notified that the content is approved.
The CA places the content into the publishing tool and deploys the content to the website.
CC reviews the content.
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 39
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Getting a Look at the Home Page FiCorp has spent a great deal of time and money designing its site and thinks the site has an appealing look and flows quite nicely. The first page a web surfer experiences is the FiCorp home page, as shown in Figure 2-5. This is the entrance page for all customers of FiCorp.
Figure 2-5. The FiCorp home page This page is broken down into several component sections and already lends itself to being put into a CMS. Breaking content pages down into components allows for better management of your content. We have compiled a list of each component in the FiCorp site and included a brief summary of each in the following sections. You will probably find that your content has similar content areas.
39
6110_Ch02_FINAL
40
7/2/06
1:05 PM
Page 40
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Page Header The page header is included in every content page for the Internet site. A significant amount of functionality is packed into the page header, which is a great area to include global type features of the website: • Existing customers can log in to a secured section of the website. This is visible on the header, designated by the “sign in” link. • The common Find a Location application is visible via the “locations” link on the header. • A new or existing customer can open an account. This is visible on the header and is designated by the “open account” link. • Search is exposed to the user. This is visible on the header and is designated by the “FIND IT HERE” input text box. • Global navigation to main site sections is visible on the header and is designated by the Personal, Small Business, Commercial, About Us, and My Account navigation tabs. • Although not displayed on the home page, the header also contains subsection navigation when displaying a main section landing page. (This was visible in the site map shown in Figure 2-1.)
Page Footer The footer also appears on every page throughout the site. The footer includes links to the site map, the terms of service, and the privacy policy for the site, all of which are required even if rarely used.
Left-Column Navigation The left-column navigation consists of a series of links. These links can be on the same page as other content within a subsection. The home page is a little different in that the links are more diverse and can be outside the current subsection.
Content Box Area The content box areas are designed to be changed as needed. This page is not a fully implemented portal but does mimic some of the basic functionality of a portal.
Main Content Area The main content area can contain any type of content. This is the meat of the page and in this case includes promotional boxes that change frequently.
Getting a Look at a Landing Page Each main section has a landing page. This landing page is a smaller version of the home page, and in fact, content is organized on this page to act as a home page for the subsection.
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 41
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Figure 2-6 shows the Personal subsection of the site. We will introduce in the following sections only the components that differ from the home page.
Figure 2-6. FiCorp main subsection landing page Let’s take a look into the components that appear on this page. We will define each and then briefly describe its function.
Breadcrumb The breadcrumb on the FiCorp site is identified by the “YOU ARE HERE” text. A breadcrumb navigational system helps identify where the web surfer is while allowing them to move to any level “above” the current page. Most often this navigation is constructed at run time based on previously defined navigation rules. For the FiCorp website, the breadcrumb is simple and is built by the tag that is included in each content page. Figure 2-6 shows the Account
41
6110_Ch02_FINAL
42
7/2/06
1:05 PM
Page 42
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
Choices subsection of the Personal section. This is a common feature, and there are no real complicated aspects to it.
Left-Column Navigation The left-column navigation differs from the home page, as you can see. The left-column navigation that appears depends on the section of the site that the user is currently in and on that section’s associated content pages. In the case of the landing page, the links are specific to this page.
Page Header It’s worth noting that the second-level navigation now appears. The surfer has navigated to a subsection of the site. This is yet another clue to inform the user of their current location in the website. As you can see, the page that you are on, Account Choices, is highlighted.
Getting a Look at the News Releases Summary Page The News Releases summary page, as shown in Figure 2-7, is a watered-down version of the landing page. We call this a basic content page layout. This page is currently being generated as static content on different sites. The business owner wants to be able to update the news releases in one location and have the content pages be updated on all the sites.
Figure 2-7. FiCorp topic landing page
6110_Ch02_FINAL
7/2/06
1:05 PM
Page 43
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
This page consists of the four most recently published news releases and must be updated each time manually.
Understanding FiCorp’s Needs In the project scoping meeting(s), you have determined that FiCorp’s needs are as follows: • You must enforce corporate legal and branding standards. • You must create a content audit trail. • The system must version all managed content. • You will eliminate the dependency on the publishing tool. • You will remove the dependency on IT to make certain content updates. • You need a system that helps organize FiCorp’s content and manage that content effectively. • The system should be intuitive to use, with minimal training needed. • You need to create a mandatory and system-enforced review process. • The standard business processes must still apply in the new system. • You must have the ability to remove content from the website. • You will migrate the entire FiCorp site to the CMS structure. Now that you have some basic background about FiCorp and some of its high-level requirements, you can take a quick look at FiCorp’s current infrastructure. Figure 2-8 offers a basic network diagram of FiCorp’s pre-CMS environments. Because of security concerns, the network is segmented into an Internet environment and an internal environment. The internal environment is broken up into a development environment and a testing environment. All code changes performed by the IT staff are completed in development, moved to the testing environment, and then finally promoted to the production or Internet environment. The publishing tool is located in the Internet environment, which allows users to publish into this environment. To protect the site, FiCorp has attempted to secure this tool. The publishing tool is a web application that uses a username and password for access. The tool is also installed on a nonstandard port number, and users of the tool must have a certificate from the internal certificate server. Even with these measures in place, the FiCorp security team would like to have a more robust and secure application for content changes. Additionally, all users of the tool would like to be able to preview their content before the content is deployed to the site.
43
6110_Ch02_FINAL
44
7/2/06
1:05 PM
Page 44
CHAPTER 2 ■ DEFINING THE CMS PROJECT SCOPE
FiCorp Basic Network Diagram (Pre-CMS) Apache
Oracle
Apache Web Logic
Internet Environment
WebLogic
Development Environment
External Mail Gateway
Gateway Apache
DMZ `
Web Logic
Developer Workstation Apache Test Environment
Figure 2-8. FiCorp basic network diagram
Summary In this chapter, we taught you how to define the project scope for a CMS implementation and gave you a best-practice methodology to help you construct the vision artifact. We also introduced you to the RUP and the FiCorp case study. We then discussed the structure of the FiCorp website and its existing publishing processes. At this point in the book, you should have a good idea of the solution that is desired to implement the FiCorp site in a CMS. In the next chapter, you will look at Interwoven’s products and learn how you can use each of these products to build world-class enterprise content management solutions. We will describe these products based on our extensive experience with them. This will allow you to make an informed decision when choosing the Interwoven products, which will be extremely beneficial to you and your organization.
6110_Ch03_FINAL
7/2/06
1:06 PM
PART
Page 45
2
■■■
The Inception Phase In the Rational Unified Process (RUP), the Inception phase is really the beginning of all project-related work activities. This phase should start with a meeting with project stakeholders and conclude with group consensus of the projects goals by all project stakeholders and proverbial purse-string holders. This is also the phase where the business case is developed and justified. It seems simple enough, but the Inception phase is one of the most misunderstood and misused phases in the RUP. In this part of the book, we will describe the events that should take place in the Inception phase of the project, as well as explain their relevance within the entire project structure. We will not focus on all RUP activities, or artifacts, because this book is not about RUP but rather uses it as a road map for its methodology. We will explain how to accomplish goals in the shortest time possible; in addition, we will describe all of this within the context of content management systems (CMSs). The Inception phase of the project should be one to four weeks long for most projects and should accomplish the following goals: Establish the project’s scope and boundaries in a vision artifact: This artifact should contain the general vision of the project’s objectives, key features, and constraints. Document the project’s critical success factors and other acceptance criteria: This will produce a business case document, which will list the business justifications for the project. (Most likely, this document will have already been created for you by the project stakeholders and is the reason that you may have been engaged for the CMS project.) Develop an initial draft of the known project risks: This will produce a risk assessment document, which is critical to helping you identify and document all the known project risks so that you can create later contingency plans to mitigate those risks.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 46
Make the decision to move ahead with a prototype of the CMS technology: If a prototype will be included, you then design, develop, and demonstrate that prototype. If you create a prototype and display it to stakeholders, you can provide the “proof-isin-the-pudding” view to cement further funding of the project. You can also use a prototype to gain further insight into the tool or technologies that will be used. Estimate the overall project cost and create an initial schedule: You should generate a project plan showing the timeline, the phases, and the iterations. This will, of course, change as your knowledge of the project increases throughout the remaining phases. Identify an initial use case model survey: This will contain all the actors and their associated use cases that can be identified at this point. This will serve to begin the requirement process and will facilitate discussions in future phases of the project. Use cases will constantly be evolving throughout the life of the project, but now is the time to begin the identification of the most glaring and critical ones. Identify a project glossary: This will contain any important or unfamiliar terms specific to the problem or business domain. This is important for consistency because the most successful projects are those in which everyone speaks the same language. This artifact will help you achieve that goal. Remember, even if you think you are familiar with a term, it is always best to seek a definition for the term from the group and experts to facilitate a common understanding among project team members. Well, that’s enough about the Inception phase; we’ll stop discussing it and start doing it.
6110_Ch03_FINAL
7/2/06
1:06 PM
CHAPTER
Page 47
3
■■■
Introducing TeamSite and Friends T
he bulk of this chapter describes the applications in the Interwoven product line. The better you understand each product, the more likely you are to purchase the right software for your needs. Every large enterprise will probably need most of Interwoven’s products, but it is not reasonable that every enterprise should start out with every application. Since we outlined and prioritized FiCorp’s project requirements in Chapter 2, you now need to determine which software will best meet your requirements. An ideal strategy for accomplishing this goal is to insert your project requirements into a spreadsheet. As you read this chapter, put the name of the product that is best suited to fulfilling a particular requirement next to the appropriate item. Some requirements will be solved by multiple products, so make sure you signify how well the product will solve that requirement. You should also categorize each product with a difficulty identifier: Out of box: A feature that can be solved without having to change a single line of code or configuration. An example is Microsoft Word’s ability to save a document. Configuration: A feature that will require a change in the default configuration. An example is modifying Word’s default behavior to save in a format other than the .doc format. Customization: A feature that will require custom code to fulfill the requirement. Returning to the Word analogy, you might add a menu that allows you to import images from a custom database application. You do not have to use these categories, but make sure you understand how each of your categories is defined. Making sure your categories have precise definitions will aid you later in identifying the effort involved with meeting each requirement. When you have finished this chapter, you should have a high-level understanding of the products required to fulfill your needs. This spreadsheet will reduce the amount of research needed by limiting the products you have to examine.
■Note Most of Interwoven products can be highly customized to fit your business, but the more detailed the product selection process you go through, the less customization you will need to make. When it becomes time to upgrade the system later, it is much better to have a system that is less customized. With that being said, if your customizations are carefully designed, they will not affect the upgrade to any great degree. 47
6110_Ch03_FINAL
48
7/2/06
1:06 PM
Page 48
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Introducing TeamSite Content Management Server 6.5 TeamSite Content Management Server (or TeamSite) is the foundation of Interwoven’s CMS product line. It will be the foundation for your applications and can handle virtually all content types. TeamSite handles many of your base content functions: • Versioning • Check-in/checkout • Data organization • Metadata • Content interfaces This chapter will cover each of these base features. Understanding how each works will become clear as you learn about how TeamSite allows user interaction. Understanding how you can organize content within the repository will provide you with the vocabulary you will need to understand in order to see how TeamSite can work for you. Interwoven calls this content area the content store. We will start building your knowledge of the content store by describing the functionality known as branching.
Branching TeamSite uses a technique Interwoven calls branching, which is the methodology employed by Interwoven to handle content divisions. You can also think of branching as separate paths for the same content, and with this technique the content may or may not be merged in the future. Three main division types accomplish this: Branch: You can think of a branch as a project in a source repository. The branch is the largest division type and contains all other types. You can have as many branches as you want, and a branch can contain sub-branches. Staging: The staging area in TeamSite is where you find all the approved versions of a piece of content. Once a version is created, it cannot be deleted from the repository; in fact, if you delete a file from the workarea, a version of the delete will be created, but all the previous versions will still appear. Since you can create versions for each branch in only one area, each branch has only one staging area. Workarea: The workarea is where an author makes changes to files that need to change. The content in this area includes the working version or pieces of content that are checked out. The workarea contains links to all the assets in the staging area until they have been modified. Once a piece of content has been modified, a copy of the file is created in the workarea. Once the author is satisfied with the content in the workarea, the author can submit the content to the staging area for versioning. A branch can have as many workareas as needed.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 49
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Although these are the three divisions of your data, another important area appears inside the content store. This area contains what Interwoven calls editions. An edition is a snapshot of the staging area as it appears when the edition is created. An edition is good for rolling back large changes in your content. You may not want to create an edition every time you make a content change, but you could if you saw a need for creating this many editions. Over the years, Interwoven has reduced the impact of creating editions, so the decision of when to create an edition is now purely based on your needs.
■Note Editions are helpful when you are deleting content from the staging area. When you delete a piece of content from the content store, you are not actually removing it. What happens is the next version of the file is actually a “delete.” What this means is if you browse the staging area, the file appears not to exist, but the older versions of the content still exist. You can restore a file in many ways, but the easiest way to get the previous version is to restore the file from the edition you created before deleting the content.
Figure 3-1 shows what a typical branch looks like. In this example, the Financial_Data branch consists of data from two groups. One group maintains the Stocks section, and another group maintains the Bonds section. The Workarea branch pictured contains the content from both the Stocks and Bonds directories, but the directory permissions allow only those personnel from each group to modify their own content. Although multiple workareas are not necessary, it may be easier to keep track of who should be approving the content by using workareas instead of by maintaining a list of directories and mapping each directory to the appropriate approvers. Financial_Data Staging Stocks A-C
Financial_Data Workarea Stocks A-C Stock AA Version 2 Version 1
D-E F-G Bonds Research Reports
Stock AA
v2
D-E F-G Bonds Research Reports
Figure 3-1. A branching structure that supports multiple groups through workareas that separate content within the same physical structure
49
6110_Ch03_FINAL
50
7/2/06
1:06 PM
Page 50
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
You may also notice that the staging area is at the same level as the workarea; although two Financial_Data sections appear, they represent a single entity. The staging area is a special read-only workarea where versioning happens. The actual content for this branch is stored underneath the staging area, and the workareas actually point back to a particular version of a file in the staging area. Once the content changes in a workarea, a copy of the file is created in the workarea where the change was made and will reflect the modifications that were made. In other workareas under this branch, this file will not show that a modification has been made. This allows the workareas to provide parallel development. Since two groups are working with the content in this branch, we would create two workareas. Each workarea would be used by a separate group of employees from the finance department, so by creating workareas for each group, we could easily change the permissions in each of the workareas to isolate the Stocks directories from the Bonds directories. The benefit of using the same branch but different workareas is that both groups can easily see the other’s content but not be able to change it. This helps them preview content if there is any interaction between each group’s content. This is by no means the only way to handle this scenario. Depending upon how many contributors a particular group has, we could have created a workarea for each author. We could have also created additional branches to handle the separation of the Stocks and Bonds content. This, however, can add a lot of overhead and external processing to bring the entire site together. Chapter 7 will cover more about branching.
Metadata Chapter 1 discussed metadata and what some of its uses are. The content store allows metadata to be stored for every piece of your content. The content store does this with what Interwoven calls extended attributes. Interwoven uses extended attributes to define proprietary application data for your pieces of content, but they also allow you to define your own custom attributes and your own custom content categories. For example, when an author is looking for a piece of content for a new article, these categories can help narrow their search by indexing the custom attributes. You can also use extended attributes to help determine what kind of review a piece of content needs or whether it can be published on a public site. If any data needs to be kept regarding your content, you can use extended attributes to store it. These attributes are stored inside the content store but can also be stored inside a database by installing an add-on called DataDeploy. We’ll discuss DataDeploy in the “Introducing DataDeploy 6.0: Database Deployment for OpenDeploy” section.
Application Type This is a stand-alone application.
Supported Platforms TeamSite supports both Solaris (versions 8 and 9) and Windows (2000 Service Pack 4 and 2003 Enterprise 32-bit).
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 51
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Supported Web Servers On Solaris, TeamSite supports the following web servers: iPlanet 6.0 Service Pack 2; IBM HTTP Server 1.3.19 and 2.0.47; and Apache 1.3.17, 1.3.22, and 1.3.26. On Windows, it supports Internet Information Services (5.0 for Windows 2000 and 6.0 for Windows 2003); Apache 1.3.29; iPlanet 6.0; and IBM HTTP Server 1.3.19 and 2.0.47.
Application Servers Apache Tomcat is the main application server that ships with TeamSite; however, TeamSite supports two optional application servers: BEA WebLogic 8.1 Service Pack 2 and IBM WebSphere 5.1.
Databases TeamSite supports several databases, including IBM DB2 8.1, Oracle 8i, Oracle 9i, and Microsoft SQL Server 2000.
Introducing TeamSite Search for TeamSite 6.5 In most cases when you are creating content, you have a single purpose in mind for that content. Whether it’s a press release or a marketing brochure, ideally that content might be reused. A huge problem for large enterprises is the duplication of effort, and the reason for this is the person assigned to creating the content does not possess an efficient means for finding previously created content on the desired topic. Based on the Verity K2 search engine, the TeamSite Search component allows for content to be located, not on a website but inside your CMS. Two main components comprise the TeamSite Search system: the content index manager and the search manager. Both are introduced next.
Index Manager The index manager can create a file with all the index information for each branch you choose to index. You can configure the index manager to read a configuration file to determine which branches to index, or you can execute a command manually to indicate a specific branch. The index manager also contains a listener, which is a program that constantly runs in the background while waiting for a specific operation to take place. To activate the listener, you select the incremental updates option when installing TeamSite search. When the listener detects that content has been updated within the index manager, it will automatically update the index for this content. This ensures that user searches are up-to-date. The index manager uses what is known as a document cracker to open and read different types of file-based content. Without the cracker, the index manager would not be able to recognize different content types and pull out data that is relevant. The cracker enables the indexer to pull out data for the full-text search and pull out metadata for keyword-type searches. (Recall from Chapter 1 that metadata is data about the data and helps define a specific piece of content.)
51
6110_Ch03_FINAL
52
7/2/06
1:06 PM
Page 52
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
■Note Although fast, the index manager is processor intensive. This is why Interwoven recommends that the index manager be installed on a separate machine other than the TeamSite machine. Chapter 5 covers this in more detail.
Search Manager The search manager is responsible for performing the queries against the indices. When users need to perform a search, they can utilize the search manager in two ways. They can use the search client built into the TeamSite interface, which will be covered in detail in Chapter 9, or they can use a command-line client. The command-line client allows a user to create a query, in the form of XML, without having to use the TeamSite user interface. If you are tasked with integrating external systems into the TeamSite search functionality, this will be the tool you would use.
Introducing LiveSite 2.2 If your team is overwhelmed with the need to create numerous sites, you may want to delegate some of the control to someone other than your developers. The problem with creating sites is that every time a new site is created, the developers have to create new presentation templates. However, once the basic site layout had been determined, it is nice to have a set of template components that a user can mix and match to create new content page layouts. You probably have created components in other systems, but unlike LiveSite, most systems don’t allow you to drag and drop your components around to generate a page layout.
Components A component is a portion of a web page. No one portion can represent an entire website; it takes two or more components to create one web page. LiveSite allows you to create two types of components: internal and external. An internal component is self-contained and does not require any external data source. An external component uses an external data source including those retrieved using SOAP, Java Database Connectivity (JDBC), XML, Rich Site Summary (RSS), or a uniform resource locator (URL). You can also create custom data sources if this is required. Once you have created all the components that you need, you can start looking at the basic layout for your new sites.
Layout You can create as many layouts as needed. For instance, you may need to create a specific layout for the landing pages for your sites. Figure 3-2 shows a landing page with six customizable component locations. In this example, now that you have the basic site layout and the custom components, you can hand over the creation of a new site to a business owner. The business owner will be able to select a location and insert any given custom component. The business owner can also move the components around on the screen and position them in any of the six positions. Chapter 19 covers this in greater detail.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 53
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
2
1 4
3
4
5
6
Figure 3-2. How you might divide sections on a landing page This product allows the business user to create sites easily. If you are wondering why a business user would want many different sites, consider the desires of your end users. Do they want to look through hundreds of pages for what they want? Most enterprises have specific content requirements for each content customer. The more you can customize the content to that customer, the more likely they are going to use your site. For example, a health insurance company could provide one site to doctors and another site to patients. Most sites are designed to eliminate the need to talk to someone to get the information that is needed, so why not design a site with the exact information the customer needs?
Application Type This application must be installed on the same server as TeamSite.
Supported Platforms This application supports Windows 2000 Service Pack 4, Windows 2003 Enterprise Edition, and Solaris 2.8 and 2.9.
Supported Client Platforms This application supports Windows 2000 with Service Pack 4 and Windows XP with Service Pack 1 or newer.
Introducing TeamXML 5.5.2 Although not a new concept, XML is still in the early phases of adoption within many organizations. We have experience working with many XML-based content management server products, but most of these products are very limited. XML can really spread its wings when you are publishing large documents where many groups or people are working on the same document. We have seen people try to use Word for this, but ultimately Word is rather limited
53
6110_Ch03_FINAL
54
7/2/06
1:06 PM
Page 54
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
in this capacity. Once your user base begins surpassing the limits of Word, you will require a more capable authoring tool. After obtaining such a solution, you must come to grips with where the XML will be stored. You could store it on a network share or put it in a database, but if you do this, it is hard to utilize the XML in a way that could be most profitable to your company. What you really need is a system that was designed to handle all these tasks. This is where TeamXML comes into play. TeamXML is a repository that makes it much easier to create XML-based components. By breaking your document down into components, TeamXML helps you spread the workflow across multiple responsible parties. Additionally, many documents can use the same component, allowing for shared content to be managed from a single location. TeamSite understands what XML is and how to implement the reuse of XML components. Once you have installed TeamXML, you can start to define your XML documents as categories. Categories define how your documents should be broken down into its components. Once your documents have been broken down into its components, TeamSite can finally allow you to reuse these components. For example, an author can modify an individual component if they are altering a large document, or they can load and modify the entire document. You can also import entire XML documents without breaking it into components. With the proper authoring techniques, componentization, and process, you can create synthesized content, which is the repackaging of separate documents into a new document with a new look and feel. When you have your components in place, you need to be able to modify them. TeamXML integrates with Arbortext and SoftQuad, which produce two of the leading XML-authoring tools. This integration provides easy access to your XML data right through the authoring tool interface. Some of the features available through the authoring tool interface are as follows: • You can render XML documents from TeamSite. • You can modify specific components by selecting a component from the repository. • You can search the components that have been created for specific content to help you reuse the content. • You can view a list of dependant documents for a specific component. When a component is reused within a document, that content component becomes dependant on the new document. • You can version files by checking them in and out of the repository. TeamXML offers many other features, but this should give you good idea of what you will get by using TeamXML. If you have a large team working on XML documents or just have many overlapping documents, then TeamXML is the tool that you need to add to your authoring environment.
Application Type This application requires TeamSite.
TeamXML-Compatible Authoring Tools This application supports Arbortext Epic Editor and SoftQuad XMetaL.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 55
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Introducing OpenDeploy 6.0.2 Once your content has been approved, you need to deploy your content from the server where it was created to where it will be used. OpenDeploy makes this process particularly efficient, because although you could do it manually through scripting, OpenDeploy can do it for you automatically. You can also use OpenDeploy in conjunction with Interwoven’s ControlHub, which is covered later in the “Introducing ControlHub 2.0” section, which can be a powerful solution for managing your code. When combined with ControlHub, your code can be versioned, releases can be created, and then these releases can be deployed to testing servers and to then production. Defining a deployment strategy is not something you should take lightly because there are many different aspects of deployment, and Interwoven has addressed those issues with this product by offering n-tiered and transactional deployments.
N-tiered Deployments Many implementations require a more complex deployment strategy than simply moving files from one server to another server. A deployment strategy may require that a deployment cross multiple network segments before reaching its destination. If this is the case, OpenDeploy can solve this problem for you. Figure 3-3 shows what a tiered deployment could look like. Server A initiates the deployment. This server is located in network segment 1 and has access only to the OpenDeploy server in segment A. Network segment 2 may be acting only as a buffer between segment 1 and segment 3, as shown in Figure 3-3. Server B will then transmit the deployment to server C where the deployment could end at an OpenDeploy receiver or, as pictured here, could then be deployed to server D. Server D may be a web server where content is served from or an application server where you could be deploying an application. In this scenario, server C is not really needed, but if you set up OpenDeploy in this way, you could perform reverse deployments where server C could initiate a request for server A to start a deployment. This is useful in environments where your information security team does not allow content to be pushed into production, only pulled.
A
B
C
D
OpenDeploy Base Server
OpenDeploy Base Server
OpenDeploy Base Server
OpenDeploy Receiver
Segment
Segment
Segment 3
Figure 3-3. An n-tier deployment using OpenDeploy
Transactional Deployments You’ve probably heard of, and in fact even extensively worked with, database transactions. This transactional state is similar to how OpenDeploy treats a transaction. We have been
55
6110_Ch03_FINAL
56
7/2/06
1:06 PM
Page 56
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
talking about content in a generic sense, but we will cover some specific content in this section to show how transactions will quickly become a powerful resource in your new system. If you were deploying only one type of content, maintaining a transactional state would not be that difficult, but once you start deploying content of multiple types such as files to a file system, data to a database, and an application in the same deployment, everything gets much more difficult. You can handle such situations in different ways, but OpenDeploy has been designed specifically to handle this situation. Figure 3-4 shows how OpenDeploy can handle a deployment consisting of these components. As you can see, the figure shows two separate network segments. The deployment would be initiated from server A to server B inside segment 2. Once the deployment has reached server B, the deployment will then fan out to its appropriate server. If one of the servers fails, then the entire transaction can be rolled back. Each server will be returned to its previous state. Transactional deployments are slower to perform, so you should use them only when necessary. Segment 1
A
Segment 2
C
Application Server
B D
E
OpenDeploy BaseServer
OpenDeploy Receiver
OpenDeploy BaseServer
Oracle
Transactional Rollback Figure 3-4. Transactional deployment using OpenDeploy You can use several other configurations with OpenDeploy. Chapter 11 covers these in detail, but for now you should see that this product is powerful. It lends itself to solving most any deployment needs.
Adapters OpenDeploy supports adapters, which enable it to interface with many different repositories. Some of the approved adapters include FTP, Polytron Version Control System (PVCS), WebLogic, Oracle 8i and 9i, Microsoft SQL 2000, Hypersonic, and Unix file systems.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 57
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Application Type This application does not require TeamSite to be installed.
Supported Platforms This application supports Windows 2000 Server (Service Pack 2 or later); Windows 2003; Solaris 2.8 (Solaris 8) and 2.9 (Solaris 9) (32-bit and 64-bit); AIX 5.1 and 5.2; Red Hat Linux 8.0 and 9.0; Red Hat Enterprise Linux 2.1 and 3.0; SuSE Linux 8.1; SuSE Enterprise Linux 8.1; and HP-UX 11i.
Introducing DataDeploy 6.0: Database Deployment for OpenDeploy One of the most common places to store content is in a relational database. Interwoven has recognized that this is a valuable part of any CMS. DataDeploy was designed to incorporate database repositories into Interwoven’s CMS. DataDeploy supports many different databases such as Oracle, Microsoft SQL, MySQL, and DB2. Interwoven has linked DataDeploy with OpenDeploy to help solve some deployment needs such as providing the ability to perform target-side database deployments, which allow your content to be deployed across network segments. Take a moment to reconsider Figure 3-4. The deployment taking place from server A to server E is an example of this type of deployment. The content that is being deployed to the database is being pulled from an XML file and is referred to as structured content. The database schema can be configured using XML so that DataDeploy can deploy content to databases that are also used by your applications. The configuration allows your schema to comply with any database model created within your organization.
TeamSite Integration Integration with TeamSite allows TeamSite to trigger database updates based on certain events that occur. Some of the events that can be triggered are creating, modifying, and removing data records. Metadata for each record can be automatically stored or removed from the database. If you decide to configure the database to be updated automatically, you do have less control over when the update actually is performed.
Application Type This application must be installed with OpenDeploy.
Supported Databases This application supports Oracle 8i, Oracle 9i, Microsoft SQL Server 2000, and IBM DB2 8.1.
57
6110_Ch03_FINAL
58
7/2/06
1:06 PM
Page 58
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Introducing ContentServices 1.1.2 The ContentServices software development kit (SDK) allows you to connect to Interwoven’s underlying application programming interfaces (APIs). Figure 3-5 shows how you can use this functionality. As you can see, ContentServices allows your Java applications to be modified to interface with the TeamSite subsystems.
Existing Applications
New Application
Content Services SDK
Interwoven
TeamSite
OpenDeploy/ DataDeploy
Workflow
Figure 3-5. Interwoven’s API model ContentServices allows you to start workflows, transition tasks, deploy content, and much more. You can update existing applications to integrate with TeamSite without having to rewrite the entire application. Your new applications should be designed to use one of the TeamSite interfaces such as ContentCenter Standard or ContentCenter Professional.
Application Type This is a web service API for integrating your applications into Interwoven products.
Supported Platforms This application supports Windows 2000 Server (Service Pack 2 or newer); Windows 2003 Server; Solaris 2.8 (Solaris 8) and 2.9 (Solaris 9); Red Hat Linux 7.2, 7.3, 8.0, and 9.0; Red Hat Enterprise Linux 2.1 and 3.0; SuSE Linux 8.1; SuSE Enterprise Linux 8.1; AIX 5.1 and 5.2; and HP-UX 11i.
■Note When using the ContentServices SDK, make sure you do not reinvent the wheel. Use this only if you don’t have a way to do what you want to do in TeamSite. When it comes time to upgrade TeamSite, you do not want a lot of code depending on the older version. One way to help this is to make sure you create interface classes that access your CMS. You have to change your interface classes only when upgrading to the new version. By doing this, external systems will be decoupled from the Interwoven interface.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 59
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Introducing ControlHub 2.0 ControlHub enables you to develop your applications and promote those applications through TeamSite. The traditional application promotion process is manual and may not be as easy to manage as it once was. ControlHub gives the power back to a smaller set of individuals, ensuring that the releases are more consistent across each of the development environments. ControlHub provides the following abilities: • You can create a release for your application. • You can compare source code versions from release to release. • You can pull source from source repositories such as PVCS. • You can build your application inside ControlHub. • You can manage the promotion process with your own custom workflows. You can achieve all this through the integration of ControlHub and OpenDeploy. ConrolHub is built over TeamSite and does not require a separate installation of TeamSite.
Application Type This application is a stand-alone application installed in conjunction with OpenDeploy.
Supported Platforms This application supports Windows 2000 Server (Service Pack 2 or newer), Windows 2003, and Solaris 2.8 (Solaris 8) and 2.9 (Solaris 9) (32-bit and 64-bit).
Introducing ReportCenter for TeamSite 6.5 TeamSite employs an event subsystem that enables events within TeamSite such as creating a content file or deleting a branch to be tracked. ReportCenter further extends the event subsystem to provide you with a more detailed record of each event. ReportCenter enables you to create reports using Crystal Reports that are based on events that are triggered within the CMS. How ReportCenter achieves this is by querying the workflow engine or the templating engine to gain additional information such as the file list for a deployment or who the approvers are for a workflow. Once ReportCenter has captured all the data necessary for the event, it then stores this information in a database. ReportCenter offers many features, some of which include the following: • Keeps historical data, user activity, and workflow • Integrates with Crystal Enterprise • Ensures content change tracking, deletions, and additions of files • Reports on published editions and branch/workarea creation and deletion • Reports on the duration of workflows, tasks, and files attached to workflows
59
6110_Ch03_FINAL
60
7/2/06
1:06 PM
Page 60
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
Application Type This application must be installed with TeamSite.
Supported Databases This application supports Oracle 8i, Oracle 9i, Microsoft SQL Server 2000, and IBM DB2 8.1.
Introducing FormsPublisher If you are planning on enforcing a style guide, FormsPublisher can help you effectively do so by enabling you to separate your raw data from its end presentation. Your data is divided into groups known as categories, and these categories are further subdivided into types. FormsPublisher uses these types to map different presentations to your data. Separating your data from the presentation allows you to reuse your data over several document types. Once you have aggregated the data, you can define separate channels that can be used to generate different content types. An enterprise can generate PDF, HTML, XML, and Microsoft Office documents from a single data type or record.
■Note The term style guide is used when a company has created guidelines that are required to be followed when creating content for a specific medium. Many corporations maintain a style guide for web content and printed content.
Once content can be generated, it should be safe to start automating your content deployments to your target sites using OpenDeploy and Interwoven Workflow. Moving your content into place in this way eliminates the need for your content to be promoted along with your code promotions. Your content could be updated much more quickly and by people who have much less technical knowledge.
Application Type This application is installed with TeamSite.
Introducing MetaTagger 4.1.0 This application helps you attach metadata to your content. One problem with relying on metadata is that end users will decide upon different values for the various metadata fields. Because of this, you cannot rely on these values because everyone has a different idea of what the metadata values mean. MetaTagger can intelligently identify portions of your content and automatically identify the right category based on a predefined taxonomy, which is a predefined set of characteristics that can be used to categorize your data. You can train MetaTagger to extract fields, titles, and descriptions using an extensible pattern-matching engine. The system can then automatically create data summaries for the content. This entire process can be set to be completely automated or to give suggestions and then leave the categorization to a user.
6110_Ch03_FINAL
7/2/06
1:06 PM
Page 61
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
This intelligent metadata creation can increase your return on investment and content reuse significantly because your content is more valuable to its users. One reason for this is that it is much easier to find content that can be reused in manuals or articles, thus reducing the cost associated with re-creating the same content repeatedly.
Application Type This application does not require TeamSite to be installed.
Introducing MediaBin 4.5.3 The need for a better system to manage an enterprise’s digital assets has never been greater. People are creating new digital assets at a faster pace than ever before. This need has in turn generated the need for companies to utilize a digital asset management (DAM) tool. Some companies have built custom applications to handle the assets, but this could be a waste of resources. MediaBin can solve the enterprises needs of a DAM system. MediaBin was built with the idea of simplifying basic functions such as ingesting your assets, versioning, collecting metadata, and searching your assets. Ingesting content is often not looked at as an important aspect of the asset management system until it is time to start migrating all your assets. Often your images are in many different formats; MediaBin can translate images to a standard format or create a duplicate image in the following formats: PSD and EPS, vector EPS, TGA, TIFF, JPEG, GIF, PNG, BMP, Photo CD, and STiNG. Another problem experienced by large corporations with offices around the world is that it takes a lot of resources to move large assets to remote offices. MediaBin allows you to set up asset servers local to your users around the world. When an asset is modified, the MediaBin Syndication Manager uses a business rule system to determine which servers need to be updated with the asset changes. Many other MediaBin features enable you to share your content; you can schedule content deliveries, cluster your media servers, and utilize MetaTagger integration to classify your assets. Also, an SDK allows you to create your own application extensions.
■Note The terms ingestion an ingesting content are used when a system takes content in and can readily use this content without much user interaction.
Application Type This application does not require TeamSite to be installed.
Supported Platforms This application supports Windows 2000 Server (Service Pack 2 or newer) and Windows 2003.
Supported Databases This application supports Oracle 9.2.0.5 or newer and Microsoft SQL Server 2000.
61
6110_Ch03_FINAL
62
7/2/06
1:06 PM
Page 62
CHAPTER 3 ■ INTRODUCING TEAMSITE AND FRIENDS
■Note DAM systems can prove to be expensive. If you do not need a robust DAM system, it is better to make due with what you are currently using. Remember to take into account your asset growth when making this decision.
Following Best Practices While you are making your decisions on which products to buy and how to integrate your systems, keep the following tips in mind: • Match your requirements to the Interwoven product offerings to help narrow down your product selection and evaluation process. • Perform additional research on products that seem to fulfill your requirements. Make sure the products you have selected fulfill your requirements in a way that is satisfactory to both you and the business owner. • ContentServices will allow you to fill in the holes by adding functionality to your existing applications. • Make sure you take into account your future growth. If you do not plan for your future growth, a year or two from now you will not have a warm, fuzzy feeling about your CMS. • Try to make the implementations as generic as possible. This allows you to create a foundation for new applications and teams that become part of the CMS.
Summary Now that we have covered some of the more important Interwoven applications, you should have a better understanding of your Interwoven needs. You may also have an idea of how much work is ahead of you. You should not take CMS implementations lightly. The decisions you make today will either be your saving grace or become nightmares in a year or two. When a CMS is implemented wisely, it can offer many benefits to your content customers such as a faster time to market, a more standardized look and feel, and a more reliable facility to make updates to your content. The keys to this are to make informed decisions about your product selection and to keep your customizations to a minimum. In the next chapter, you will learn how you can take the information you have discovered here and create some useful artifact documentation from it. This documentation will carry you forward through the remainder of the FiCorp project. We will also show you how to gather requirements in a CMS project and what to do with those requirements once you collect them. We will talk about the types of requirements and how to catalog those requirements. We will also show how the FiCorp requirements map to the CMS system requirements.
6110_Ch04_FINAL
7/2/06
1:06 PM
CHAPTER
Page 63
4
■■■
Gathering Requirements B
efore we talk about defining requirements, we’ll take a step back and define what a requirement is. Requirements are capabilities or functions that a system must perform to meet a user or external system need. In other words, requirements are the “what” for a system. For example, what types of reports are needed? What fields should be available on a form? What happens when the admin user clicks the Set Up button? Everyone has, at one time or another, done their own requirements gathering. How often have you asked your spouse or roommate if they needed something from the grocery store or gas station? You might have followed this question up with others. What size? How many? What colors? What brand? All of these types of questions allowed you to gather requirements for your trip to the store. Requirements fall into many different categories. The two main categories for all requirements are use cases and other functional requirements. Use cases are stored in a use case document, and other functional requirements are stored in a supplemental requirements specification. This chapter will discuss both of these types of artifacts. We will also teach you a structured methodology for deriving use cases and other functional requirements in the context of a CMS.
Defining the Initial Requirements The requirements for a CMS are not unlike the requirements for any other system in that all the requirements fall into the use cases category or other functional requirements category. The following sections discuss what use cases are and clear up some confusion that is prevalent in the industry as it relates to use cases. We’ll also discuss another RUP artifact, which is the project glossary. The project glossary is separate from the use case documents and helps provide a common vocabulary for the project.
Introducing Use Cases The RUP defines a use case as a predefined sequence of events yielding value to an actor. An actor can be a user or another system that interacts with the system you will be documenting or designing. Use cases define the sequence of events that one or more actors will perform in order to use the proposed system. Use cases should be kept at the level of an elementary business process. This means any use cases you document should include only one set of functionality. A good example of this kind of use case is a system login. When an actor logs in to the system, 63
6110_Ch04_FINAL
64
7/2/06
1:06 PM
Page 64
CHAPTER 4 ■ GATHERING REQUIREMENTS
many other processes may take place for this event to occur, but those other processes should be kept separate from the base use case of a system login. We will discuss this in much more detail in the “Examining a Use Case Document” section. For the Inception phase, you should identify the “low-hanging-fruit” use cases; in other words, pick only those use cases that are easily identifiable at this early stage in the process. Additionally, you should try to identify any architecturally significant use cases, which are any use cases that will have a profound effect on the design of the proposed system. At this stage in the RUP, you will not be able to completely define the identified use cases, but you can identify the need for most use cases. You will then refine these use cases as you progress through the process. A major component of a use case is the actor, which is the system or user who initiates the use case or acts on the use case. By defining the actors, you define specific sets of functions that the system/user can perform when interacting with the CMS. Some examples of actors include people (or more accurately, roles), hardware, other systems, and reporting applications. For example, if you are designing a database to store customer sales data and you have another system (System A) that retrieves summary sales data from that database system, then you could say that System A is an actor of the database system. The concept of an actor will become clearer as you proceed in the case study. When gathering requirements and conducting requirements workshops with users, keep in mind that the users will have a desire to name specific people as the actors; your task will be to remove the individual and define the role that the individual fulfills. For example, when you ask users “Who starts the system?” they may respond by saying “Debra does that.” Your task is to dig deeper into that response. What else does Debra do? Does anyone else do what Debra does? Can anyone else do what Debra does? What is the process that Debra follows to do this? Of course, the next step is to interview Debra to further cement your understanding of the system to be developed. This chapter will discuss the artifacts that should be generated during the Inception phase and provide samples of those artifacts for the case study. We will start with the project glossary artifact.
Defining and Documenting a Project Glossary Early in the process it is critical to establish and maintain a common vocabulary of terms and their definitions. This ensures that all the project team members are using the same vocabulary. Think of this as the driver for all members of the project team and stakeholders to speak the same language. This common vocabulary will help reduce misunderstandings during the project life cycle. The best way to begin to compile this glossary artifact is through your interactions with project stakeholders and domain SMEs who you will interview and work with during the project. A sample of the FiCorp glossary is as follows: APR: Stands for Annual Percentage Rate. An APR is the cost of acquiring a loan. This is the interest rate paid on a loan and all applicable fees over the period of a year. Affinity cards: These are cards that behave much like credit cards but are usually tied to a charity or organization. They may impose fees based on each transaction.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 65
CHAPTER 4 ■ GATHERING REQUIREMENTS
Cash cards: These are plastic cards issued by banks or other institutions used for withdrawing money from automated teller machines (ATMs). Credit card: These are plastic cards issued by merchants to extend credit to the individual holding the card. These cards can usually be used to withdraw money or to purchase items on credit. Debit card: These are also plastic cards issued by banks or merchants; instead of extending credit, these cards actually withdraw funds from the cardholder’s bank account usually within a few days of the purchase. The sample glossary section just provided is far from complete; however, it should give you an idea of the information that belongs in the glossary. This information will probably change based on each project or organization. Once your glossary begins to take shape, it is the responsibility of the systems analyst to maintain this document throughout the life of the project. Additionally, the systems analyst should verify the document with all project members to achieve sign-off and to guarantee that the artifact is accurate.
Completing the Catalogs As you start to define the requirements for the proposed system, you must define the actors that will use the system. You will also identify what those actors do in the system or how they interact with the system. The project team, facilitated by the systems analyst, must also define a description for each actor. All of this information will be compiled into a use case catalog and an actor catalog. You can think of the use case and actor catalogs as the inventory documents that list all the use cases and actors. You will find that the information in each catalog may change as you progress in your project, but it is imperative that the information be maintained and updated with any changes. As new project team members are added, this information will help them come up to speed on the problem domain. This information will also help the project manager complete the iteration plans and assign resources to use cases. The associated resources will include analysts to research and write the use cases and developers that will actually implement the use cases into the system. We recommend developing the actor catalog first and then refining it as you go. When you are defining the actors, you will notice that some of the previously identified actors perform the same function in the system and so should be combined into one actor. When combining actors, you should maintain the actor name that makes the most sense to all project team members. Actor names should be as descriptive as possible. The following section covers the process for defining actors.
Defining the Actor Catalog Defining the actors in a system can be a tricky process. On one hand, you have to derive all actors in the system, but you have to be careful not to create imaginary actors or actors that may not belong to your system. You must invite some of your users and project stakeholders to a workshop where you can define the actors. The main reason for this is that only they will know the non-IT component of the system you are developing. Also, you will need their direct input to give the actors descriptive names.
65
6110_Ch04_FINAL
66
7/2/06
1:06 PM
Page 66
CHAPTER 4 ■ GATHERING REQUIREMENTS
One of the values in this type of development process is that you earn the stakeholders’ respect and make them champions of the CMS. No longer will they feel like IT is a black hole where they send large amounts of money in the hope that they get something useful (software) from the bargain. Including these individuals in all areas of development means they will feel like their opinions are respected and valued. This relationship building is highly reciprocal and will pay off in large dividends to your project. The first step in defining actors for your CMS is to identify the common roles used in the system. You should have one to several meetings with the business owners to determine the actor catalog. Questions to ask during these meetings/requirements workshop sessions include the following: • Who will need to use the CMS? • Who/what will monitor the system? • Who will need to log in to the CMS? • Who will be responsible for starting and stopping the CMS? • Who will be responsible for performing maintenance on the system? • Who or from where will content originate that is ingested into the CMS? • Who will be responsible for updating or creating content in the CMS? • What external systems will the CMS need to communicate with? • What external systems will use this CMS? • Does anything happen automatically at a certain point in time? If so, who/what initiates this action? • Where will the content reside? • Where will the content be sent? It is unreasonable to think you will identify every single actor for the CMS at this point. In fact, we can assure you that you will discover others during later project phases. Just do the best you can at this point, and remember to run any identified actors by the business owner, project stakeholder, and other company representatives for validation. Also important to keep in mind are systems such as content intelligence systems (MetaTagger), search systems, and repository systems. All of these systems could be potential actors that you will want to document and capture. As part of the actor discovery or requirements workshop, you should decide on a specific description for each actor and keep this up-to-date in the actor catalog. You must also attempt to ascertain what functions each identified actor performs in the system. This way, duplicate roles can be combined into a single actor definition. See Table 4-1 for a sample actor catalog.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 67
CHAPTER 4 ■ GATHERING REQUIREMENTS
Table 4-1. Sample FiCorp Actor Catalog
Actor Name
Actor Description
Characteristics
TeamSite Role
Research analyst
Creates and updates content specific to their market segment.
FiCorp has several research analysts. Each one is responsible for researching and analyzing market information and specific market segments.
Editor
Topic SME
Initiates a workflow request.
Reviews and approves content modifications. Performs topic-specific content review. Each topic SME represents one to many users.
Administrator
Legal researcher
Initiates a workflow request.
Reviews and approves content modifications. Performs legal content review. Each legal researcher represents one to many users.
Editor
Search index manager
Performs index creation used in query operations.
N/A
TBD
Search manager
Performs queries against indices. Responsible for returning results to the user.
N/A
TBD
Workflow engine
Manages various process flows in the CMS.
N/A
TBD
OD admin
Allows access to the OpenDeploy’s administrative functions.
N/A
OD admin
OD user
Allows access into OpenDeploy
N/A
OD user
Technical lead
Creates branches, workareas, and other content structures such as templates and workflows.
Each technical lead represents one to many users who are responsible for maintaining the CMS infrastructure components.
Master
As you can see from Table 4-1, the actor catalog contains four columns: Actor Name, Actor Description, Characteristics, and TeamSite Role. These columns contain the following information: Actor Name: The actor name will be used throughout your CMS design. The name should be as descriptive as possible and should be a familiar title or name for your particular business domain. Actor Description: This column should contain a brief description of the actor, including the actor’s sphere of responsibility in the CMS.
67
6110_Ch04_FINAL
68
7/2/06
1:06 PM
Page 68
CHAPTER 4 ■ GATHERING REQUIREMENTS
Characteristics: This column contains important details regarding the actor such as the number of users the actor represents and any other important actor characteristics you need to capture. Examples include the actor’s level of system knowledge, domain knowledge, and demographic knowledge, as well as the other applications this user knows how to use. TeamSite Role: This column will contain the anticipated TeamSite role. Interwoven technical advisors can greatly assist with this analysis. Let’s look at this actor catalog a little more closely. You should notice some familiar names in Table 4-1. The research analyst, topic SME, and legal researcher actors were mentioned in Chapter 2 when we defined the business process flows. For the purposes of the FiCorp case study, you would have determined (during the requirements workshop meetings) the values for the remaining columns that are associated with these actor names. You may not have all this information at this point in time, and it is not critical for you to have it; however, remember that you will need to update most documentation periodically throughout the project. You should make updates each time you make new discoveries and requirements or modify design-level details. When you uncover any new information, review your documentation to ensure that changes are reflected in the documentation. The best way to capture this information is in a spreadsheet. By using a spreadsheet, you can set up a data filter to view only the data you want to see. By having the actors defined and documented in this way, your project will be much better organized. These actor definitions will also feed into test cases that must be developed to ensure that your implementation has been a success. Some of these roles will probably already be defined at your organization; however, it is still worthwhile to document them. With these definitions, you will have a complete picture of all the roles and responsibilities in the CMS that you are implementing. The next and most time-consuming part of the process is to begin to identify and document the use cases for the CMS.
Defining the Use Case Catalog After gathering and defining actors, you must begin to derive potential use cases in which those actors will participate. The first use cases will be preliminary and will require multiple iterations to become stable. The best way to identify potential use cases is to ask what each actor requires of the system. Also, if the actor requires any additional functionality, then this may be a good candidate for a use case. Remember to keep your use cases modular so that they can be reused across multiple actors. For each actor, ask the following questions: • Will the actor need to create, read, update, delete, store, or upload content through/into the system? • What are the primary tasks that the actor will perform in the CMS? • Will the actor need to inform the system about any external events? • Will the actor need to be notified of any events occurring in the CMS? • Will the actor need to approve or deploy content inside the CMS?
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 69
CHAPTER 4 ■ GATHERING REQUIREMENTS
IDENTIFYING TEAMSITE ROLES You should begin to identify the TeamSite roles that you will use for each actor you have documented. TeamSite up until version 6.7 contained four basic roles. Each role has specific access and optimized user interfaces. The four default roles are Author, Editor, Administrator, and Master. In version 6.7 of TeamSite, you can define user roles for your specific CMS implementation that have capabilities across the default TeamSite base roles. For example, you could define a corporate communications user with all the Author role default abilities that also owns a workarea. Workarea ownership is typically reserved for the Editor role, but in this example the corporate communications user could have responsibilities across those roles. Here are the roles broken down: • Authors are primary content contributors, and their access is usually restricted to a specific workarea in the CMS. This user is primarily responsible for updating and creating content and does not have to be a technical user. This user usually accesses the TeamSite system via the ContentCenter Standard or Casual Contributor interface. • Editors own workareas and typically assign content work to Authors. Editors can perform content updates, perform content creation tasks, and also create editions of content. Editors have all the abilities of Authors but also have access to advanced TeamSite functions. • Administrators own workareas. Administrators control functions at a branch level and can create new workareas for Editors and Authors. Administrators manage content creation projects performed in their branches. • Masters have the highest level of authority in TeamSite. Typically these users interact with the TeamSite system via the ContentCenter Professional interface. These users have permissions that supersede those of any other user. In fact, this user can perform functions across all branches and workareas. Masters have all the abilities that Administrators have and more. Do not worry if you do not understand all of this information at this point in the book. Chapter 14 will cover TeamSite user roles in more detail.
You get the idea. For each actor, put on that actor’s hat, and working with the SMEs and project stakeholders, validate all functionality that the specific actor will need access to in the CMS. The answers to the previous questions will become candidate use cases. Document all use cases discovered. Only by documenting the main success scenario or the basic flow of each use case will you be able to determine its relevance to the CMS. For every discovered and documented use case, record a description of the use case goal. As you are documenting the use cases, refer to the glossary document to ensure that you are using terms appropriately and you are adding any new terms to the glossary. The next step in the process is to begin to associate the newly discovered use cases with your actor list. We recommend that you create a use case catalog to track this association. See Table 4-2 for a sample use case catalog.
69
6110_Ch04_FINAL
70
7/2/06
1:06 PM
Page 70
CHAPTER 4 ■ GATHERING REQUIREMENTS
Table 4-2. Sample FiCorp Use Case Catalog
Use Case ID
Use Case Name
Associated Actor(s)
Description
UC1
Log In to CMS
Research analyst, topic SME, legal researcher
This use case describes the sequence of events for a login to the CMS. This allows the actors to gain access to the CMS and perform all their functions within the boundaries of this system.
UC2
Log Out of CMS
Research analyst, topic SME, legal researcher
This use case describes the sequence of events to log out of the CMS. This allows the actors to sign out from the CMS and close all open sessions and windows.
UC3
Create Content
Research analyst
This use case describes the sequence of events used to create content.
UC4
Spell Check Content
Research analyst
This use case describes the sequence of events used to check the spelling of content.
UC5
Choose Content Type
Research analyst
This use case describes the sequence of events used to select a content type, such as a header, footer, promotional text box, or any other type of CMS template.
UC6
Choose Presentation
Research analyst
This use case describes the sequence of events used to associate the selected content type to a presentation format.
UC7
Generate Content
Research analyst
This use case describes the sequence of events used to merge the content type data with the selected presentation.
As you can see from Table 4-2, the use case catalog contains four columns: Use Case ID: The Use Case ID column identifies the specific use case. This information is important if you are using some type of requirements management software such as IBM Rational RequisitePro. Generally, you should give the use cases an ID beginning with the two letters u and c for use case. Later, if you want to break your use cases into businesslevel use cases (use cases specific to the business user or role) or system-level use cases (use cases that are system specific and hidden from the business user), you can identify the use cases with a bluc or sluc identifier respectively, based on the type. Use Case Name: This column should contain the name of the use case as identified in your requirements workshop. This name should be unique across your CMS development effort. Associated Actor(s): This column contains any actors that you think will need to use this specific use case. Remember, if any actor shares the same use cases of another actor, then you should combine those actors into a single actor. In this example, you should notice that all of the human actors have to log in and log out of the CMS. The system actors are used within the system and therefore do not need to log in. Also note that the method and procedure of each actor’s login may differ significantly even though they each have to log
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 71
CHAPTER 4 ■ GATHERING REQUIREMENTS
in to the system. These different methods of logging in to the system could be documented in a single use case document but as variant use cases or alternative flows. Description: This column should contain the goal and brief description for the identified use cases. Let’s take a deeper look at one of the use cases in this use case catalog. In this example, UC1 is the Log In to CMS use case. Notice that three human actors are associated with this use case. The reason for this is that each human actor who uses the CMS will have to log in before using it. Make certain that each use case has been agreed to by all project members. Once again we recommend you use a spreadsheet to document your use case catalog. This way you can filter the data to display only the information that is relevant at a given time. Additionally, you could insert other columns into your use case catalog that could list information such as the developer or development team the use case is assigned to or the iteration in which the use case will be developed.
Following Use Case Best Practices Here are some use case best practices: • Use cases should be written and named in “verb + noun” fashion. • Use cases should be as descriptive as possible. • Use cases must provide measurable value to the actor that initiates the use case or uses the use case. • Add all identified use case nouns to your project glossary, and validate their existence in the problem domain with project team members. • Use cases must have unique names.
Creating a Use Case Model Use cases define the outwardly visible aspects of a system and can be classified as a system use case or a business use case. Usually the business use cases can be further split into multiple system use cases. During the Inception phase, your goal is to identify the most architecturally significant use cases that you will later refine. You should define and create at least one use case model for each actor in the CMS. This use case model will visually depict all use cases associated with each actor. This will help you trace use cases that may be affected by changes in any other use case. In the use case document, you include a section to insert the use case model or models of the actor(s) involved in the specific use case document. The following are some industry-standard nomenclature for describing and documenting use cases. To allow you to become familiar with the terms, we have defined them for you. Uses: In a use case, the “uses” relationship signifies that the base use case utilizes another use case to accomplish the sequence of events. You will use the “uses” relationship to signify functionality that must be included for the base use case to achieve its goal. You will include the “uses” relationship for any functionality that will be reused significantly. Not every use case will lend itself to being in a “uses” relationship, but if this use case is called
71
6110_Ch04_FINAL
72
7/2/06
1:06 PM
Page 72
CHAPTER 4 ■ GATHERING REQUIREMENTS
more than once, a good rule of thumb is to put it in its own use case and use the “uses” relationship. An example of this is the save functionality of Microsoft Word, which would lend itself to the “uses” relationship. When modifying a Word document, you then save the document to retain your changes. The save would then be included via a “uses” relationship. In a use case diagram, the “uses” relationship is modeled as an arrow pointing toward the base use case with displayed above it. Additionally, the actual modification of the Word document would also be a use case. Extends: In a use case, the “extends” relationship signifies that this functionality will extend the base use case. We know this may sound confusing at first, but trust us, this will make sense as you work through the case study. The “extends” relationship is significant in that the extension point does not necessarily have to occur for the base use case to accomplish its goal. For example, consider the use case of creating a Word document. When the Word document is first created, you have to use the Save As function to actually finalize the creation of the Word document. The Save As functionality allows you to give the newly created file a name and choose a location for the new document to reside. Even if you have clicked the Save button, Word calls the Save As function because you have not given the Word document a name yet or a location. However, if you have given the Word document a name and then click the Save button, Microsoft Word saves your changes in the existing document. You could then classify the Save As function as an extension of the Save functionality. Of course, you can bypass this extension by initiating a Save As directly, if for example you wanted to create a new copy of the existing document under a new filename. Another example of the “extends” relationship would be if you were buying a snack from a vending machine. If you supply exact change for the price of the item, then the snack machine will dispense your chosen snack. However, if you supply a dollar bill for a 60-cent item, then the snack machine has to provide you with 40 cents of change. You could name this sequence the Provide Change use case. The Provide Change use case would be an extension to the base use case of Dispense Snack. In a use case diagram, the “extends” relationship is modeled as an arrow pointing toward the base use case with displayed above it. Actor: An actor is anything that interacts with your system. An actor can be a person or an external system. When defining actors, it is important to define roles and abstract the actual users from the use case view. You will find when defining actors that one person may fill several different roles in the system and therefore may be shown as two or more actors in the use case view. Alternatively, different people may fill the same role in the system and therefore will be represented by the same actor. In a use case diagram, an actor is identified as a stick figure with the actor name displayed underneath the stick figure. Use Case: A use case is a sequence of events that yields an observable value to an actor. The use case is an invaluable component of the total requirements for your system. One or more actors always initiate a use case. In a use case diagram, the use cases are identified as bubbles with the use case name displayed inside them. Association lines: An association line is a straight line that is drawn from the actor to the use case bubbles that the actor uses. In a use case diagram, the association lines are identified as straight lines connecting the actor with the use cases. See Figures 4-1 and 4-2 for sample use case models.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 73
CHAPTER 4 ■ GATHERING REQUIREMENTS
FiCorp CMS system Import Content
Request Content Approval Log In to CMS
ContentCenter Create Favorite
Attach Content «uses» Research Analyst
Delete Content Initiate Content Promotion
Delete Favorite Edit Content
Figure 4-1. Research analyst use case diagram1
FiCorp CMS system Associate Assets
«extends»
ContentCenter
Create Content
«uses»
Select Asset(s)
«uses» «extends»
Choose Content Type
«uses»
«extends» «uses» Choose Presentation
Generate Content Research Analyst
Spell Check Content
«extends» Save Content
Save As Content
Figure 4-2. Research analyst use case diagram2 In the sample use case model for the research analyst, notice that several, but not all, of the use cases are associated to the research analyst. In the real world, this model artifact would list all use cases associated to the research analyst actor. For the sake of brevity, we have left this use case incomplete while providing you with enough information to understand the method. When you discover new use cases, you would add them to this model to have a complete view of the use cases for the research analyst. Every use case bubble on the model represents a use case artifact or document that should contain the sequence of events for that use case. The following section details exactly what belongs in a use case document and how to construct one. By following this methodology, you can have a complete understanding of the CMS to be built.
73
6110_Ch04_FINAL
74
7/2/06
1:06 PM
Page 74
CHAPTER 4 ■ GATHERING REQUIREMENTS
Stay with us here while we walk you through the model shown in Figure 4-2. On the left side of the figure you can see what looks like a stickman with the words Research Analyst under it. This stickman represents the actor for this model. In a use case model, every actor, human or otherwise, is represented in the model by a stickman. The next thing you may notice is the large square that the actor sits outside. This square represents the boundaries of the CMS. Any use case that must be built into the system is represented as a use case bubble, and the actor must sit outside the system boundary because the actor will not be part of the CMS. Take a look at the Create Content use case. Several use cases are linked to this use case via either an or an . If you look at the use case named Spell Check Content, you will see that it extends the functionality of the Create Content use case. Why is this? Well, in FiCorp’s CMS, forms will be used to enter the data that becomes the actual content. There will probably be some type of structure on this form (a link or button) that initiates the spell check function for the recently entered content. However, the user does not have to initiate a spell check on their content, and therefore this use case extends the Create Content use case. The Spell Check Content use case would also extend the Edit Content use case or any use case that exposes this functionality. Now take a look at the Choose Content Type use case. The Create Content use case this use case because the flow of the Choose Content Type use case is always involved when using Interwoven TeamSite to create content. This makes a great deal of sense—basically, you have to know what you are making before you make it. Use case models give you a good view into all the ways each actor will use your system and are an invaluable analysis tool. This is especially true when designing a complex system such as a CMS. Many good books and references are available to assist you in developing use cases and use case models. One of the very best is Applying Use Cases: A Practical Guide, Second Edition (Addison-Wesley, 2001), by Geri Schneider and Jason P. Winters. We suggest that you find some good references, read them, and put them into practice. By doing that, you will greatly reduce your missed requirements, improve customer satisfaction, reduce software defects, and reduce your testing effort.
Understanding the Elements of a Use Case Document These are the elements of a use case document: Description: The description should be one to three sentences about the purpose of the use case. In most instances, the description will be one sentence in length, but more is always better (while not being excessive). If you can give the description to someone who is not on the project and they can understand what the use case accomplishes, then that is enough of a description. Level: The level of the use case signifies the visibility or detail of the use case. For example, if the use case details technical events, the level would be system, but if the use case details only user-level events or a business-level sequence, then the use case would be a business-level use case. If you have broken the proposed system into architectural layers, you could specify a use case at each architectural layer. For example, you might have repository-level use cases, web service–level use cases, or search-level use cases. Primary actor: This is the name of the actor that initiates the use case. Remember, this can be a human role or an external system. Other actors can be involved in the use case, but
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 75
CHAPTER 4 ■ GATHERING REQUIREMENTS
this element will list only the primary actor involved. Multiple actors may be listed, but only if each actor can initiate the use case. Stakeholder: This element will list the associated stakeholder for the proposed system. Usually the stakeholder element will be replicated across several use cases. If you are detailing a system use case or architectural use case, then this element can contain the name of the architectural lead or development lead. Precondition(s): What must be in place, or what must have occurred for this use case to successfully complete? This element is important to establish the starting conditions for the use case. By thinking about this element and supplying the appropriate information, you can also find holes that were missed during previous system analysis. Success end condition: What is the outcome of this use case, and what conditions will be fulfilled when this use case ends? This element will provide that information. Keep in mind when writing this element, the element will supply the success criteria only for the basic flow of the use case. Alternative flows will not lead to the outcome specified by this element. Failure end condition: What will be the outcome of this use case if the basic flow of events is not completed successfully? The failure end condition will usually be the opposite action of the success end condition. Trigger: The trigger element will be the action, performed by the primary actor, that causes the basic flow of the use case to begin. For example, a trigger could be when the content contributor selects the CMS login link. Main success scenario: This is the basic flow of events for the use case. The main success scenario element will list (usually in a table) the flow of events for the use case. This flow of events is also referred to as the happy path or happy day scenario. This section of the use case document will contain all the steps needed in the sequence to achieve the success end condition. This flow of events should be discussed and documented in the use case workshops that you will be having. All steps should be numbered sequentially and should appear on their own line. It is important to keep only one step per line in this element; in that way, the alternative scenarios can branch from each discrete step. You may also find as you document use cases that some set of steps or functionality is repeated in several use cases. In this case, you should remove these repetitive steps from those use cases and put those steps in their own use case. The original use cases will then call the new use case. Alternative scenario(s): The alternative scenarios will list exception paths or alternative flows for the use case. For example, when documenting a login use case and the wrong password is supplied, then an alternative flow would list the sequence of events where the user is prompted to supply a valid password. All alternative scenarios will not be discovered during initial analysis, but keep in mind that the use cases written for your project will be constantly evolving as it progresses. Some alternative flows may be uncovered during initial use case workshops, but uncovering all alternative flows is not likely. Associations: Remember how we said that repeated sequences should be broken out into their own use case and then called from the original? Well, the Associations section will list any called use case, used use case, or extensions to this use case. In this way, you will
75
6110_Ch04_FINAL
76
7/2/06
1:06 PM
Page 76
CHAPTER 4 ■ GATHERING REQUIREMENTS
be able to maintain the use case traceability, and when a use case changes, you will have an easy way to remember all other associated use cases and can check those use cases for changes. Error condition(s): It is a good idea to maintain error conditions in your use case documents. By supplying anticipated error conditions in the use case, the use case implementer or engineer will know the types of errors that can be generated during the execution of the use case. You should document error conditions in a table with several columns. The recommended columns are the actual error condition such as Incorrect Password, the technical description or meaning that will probably be written to a log file, and a column for the displayed message (if any) that will be presented to the end user if the error occurs. This may seem like a great amount of documentation, but you may be surprised to see the number of previously uncovered requirements that will surface simply because use cases will make all project stakeholders think through the flow of events. You will find a sample use case document in the “Examining a Use Case Document” section.
VALIDATING ACTORS AND USE CASES If you uncover an actor during the use case and requirements workshop that is not documented in the actor catalog, be suspicious of it. Check that the actor is really part of your CMS or interacts directly with it. Sometimes when conducting the workshop, actors are uncovered that should not be a part of the system that you are creating. If the actor passes these tests, then add it to the actor catalog. Keep in mind, though, that the actor must have use cases associated with it; if none is found, then most likely the actor is already accounted for by a different actor. Additionally, if a use case is uncovered with no associated actors, then you should be suspicious of it as well. Most likely this means you have missed something. At least one of your actors should be associated to this use case in a use case diagram. If either of these conditions occurs, validate your use cases and your actor catalog immediately and discuss the situation with all team members. The goal of this discussion is to reach a consensus with what to do in this particular situation.
Examining a Use Case Document In the following sections, you’ll look at a sample use case document to further cement your understanding of use cases. Please refer to the “Understanding the Elements of a Use Case Document” section while reviewing the following use case. This will help you understand why we have constructed the use case document as we have.
Use Case Diagram This section contains the use case model or models referenced by the use case document.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 77
CHAPTER 4 ■ GATHERING REQUIREMENTS
Log In to CMS Use Case This section of the document describes the use case and identifies who uses the use case.
Description This description will be inserted directly from the use case catalog. This use case describes the sequence of events for a login to the CMS. This allows the actors to gain access to the CMS and perform all their functions within the boundaries of this system.
Level Business.
Primary Actor Research analyst, topic SME, legal researcher.
Stakeholder Kimberly McNeal, vice president of Corporate Communications, FiCorp.
Precondition The actors are valid users within the CMS. The actors have valid usernames and passwords.
Success End Condition Actors gain successful entry into the CMS and are allowed to perform all functions to which they have access.
Failure End Condition Actors are unable to gain access to the CMS.
Trigger Actors initiate the login action via a login screen.
Main Success Scenario The main success scenario is as follows: 1. The actor accesses the URL for the CMS login screen. 2. The actor supplies the following information on the login screen: • Username (system/company IT assigned) • Password (eight-character minimum, uppercase/lowercase, and numeric)
77
6110_Ch04_FINAL
78
7/2/06
1:06 PM
Page 78
CHAPTER 4 ■ GATHERING REQUIREMENTS
3. The actor initiates the login action. 4. The system verifies the actor’s credentials. 5. The system loads the ContentCenter Standard or ContentCenter Professional interface.
Alternative Scenarios The alternative scenario is as follows: 1. The actor-supplied password is incorrect. 2. The CMS informs the actor that their login is incorrect and prompts again the actor for log in. 3. The actor-supplied username is incorrect. 4. The CMS informs the actor that the username is not found and prompts again the actor for login.
Associations List any associated use cases here. The FiCorp use case is not complete but would probably contain use cases for the load of the ContentCenter Standard and ContentCenter Professional interfaces, depending on which interface each actor (research analyst, topic SME, or legal researcher) required.
Error Conditions This section contains any error conditions and specially defined error messages.
■Note Although the FiCorp use case document is not complete, it should show the structure and methodology to follow when developing use case documents.
Defining Non-Use-Case Functional Requirements We spoke earlier about the two types of requirements that will come from your requirements workshops; these include use case requirements and non-use-case requirements. We have described in detail the use case requirements, including how to capture them and how to structure the use case analysis. In this section, we will cover any other requirements that are not visible to a use case and how to categorize them. These requirements collectively form the supplementary requirements specification. The recommended categorization scheme for non-use-case requirements is the FURPS+ model. FURPS stands for Functional, Usability, Reliability, Performance, and Supportability and Scalability. The + stands for any other requirements not visible to a use case. The FURPS+ model was developed by Robert Grady and documented in his book Practical Software Metrics for Pro-
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 79
CHAPTER 4 ■ GATHERING REQUIREMENTS
ject Management and Process Improvement (Prentice Hall, 1992). This model is extremely useful as a guide when deriving software requirements. This model also provides a way to categorize those non-use-case-centric requirements. The following describes the FURPS+ model: Functional: This requirements type can contain feature sets, capabilities, and securityrelated requirements that do not fall in the boundaries of a use case. For most applications, this category will contain the largest percentage of requirements in the supplementary requirements specification and should be written in a natural language style. Usability: This category takes into account aesthetics, consistency, human factors, documentation, user interface considerations, online context-sensitive help and self-service requirements, frequency/severity of failures, wizards and other selection/start-up requirements, recoverability, and last but certainly not least training materials. Reliability: This category is composed of requirements that deal with the frequency and severity of failure, including the mean time to failure and predictability of failure. This category also contains accuracy, efficiency, recoverability, and allowable defects. Performance: This category can sometimes share requirements with the reliability category, and in some circumstances a fine line exists between these two categories. The point is not to argue where a requirement belongs but rather that the requirement is documented somewhere in the supplementary requirements specification. Usually this category contains resource consumption requirements, throughput, average response times for user and system events, capacity (the number of concurrent users and events the system can or should support), recovery time, testability, adaptability, and availability. Supportability and scalability: This category encompasses configurability, extensibility, adaptability, maintainability, compatibility, serviceability, stability, and localization requirements. This requirement category is key and should be the section of the document where you put any languages that the system must support, including types of character encodings. Arrange your supplemental requirements under these categories and list them one by one. Review this requirements list with your stakeholders and other team members to receive sign-off. As you develop your system, make certain these requirements, and those captured in use cases, are included in your test plans. These requirements along with the use case requirements will lead to a complete understanding of all CMS requirements.
Conducting a Requirements Workshop We recommend that during the Inception phase of a CMS project that you conduct at least one requirements workshop and one to two use case workshops. A requirements workshop will help you gather further requirements from project stakeholders if they haven’t supplied any or clarify the requirements that project stakeholders have already supplied. Most supplied business requirements will be at a feature level if understandable, but most if not all of these business requirements will require clarification. The requirements workshop is the opportunity for you to meet, greet, and gain commitment from project stakeholders.
79
6110_Ch04_FINAL
80
7/2/06
1:06 PM
Page 80
CHAPTER 4 ■ GATHERING REQUIREMENTS
Think about this: the easiest way to get someone’s buy-in on a project is to include them in that project. When people are part of something, then they become emotionally tied to it and will fight to ensure its success. A qualified business or systems analyst who is experienced in meeting facilitation should run the requirements workshop. This person must be nonbiased and willing to give everyone in the room a chance to speak. We recommend the following guidelines be followed in a requirements workshop: • The workshop size should be limited to 8 to 12 people. This number should not include the one to two scribes whose function is to record only. • The workshop should have representation from the business staff and the technical staff or IT. • Everyone should be given the opportunity to speak. If grandstanding or filibusters occur, then speaking time for all participants should be limited to three minutes each. • The workshop should be an all-day workshop with a one-hour break for lunch and two 15-minute breaks. Have one break before lunch and one break after lunch. • If possible, after lunch, we recommend the facilitator decrease the temperature of the room by 5 degrees or more. This will ensure that no one gets sleepy. Soda and coffee should also be provided to meeting participants. • The facilitator must keep everyone and the meeting on track. Document all ideas from all meeting participants, and then later in the day rule out any impossibilities or out-ofscope requirements. The task objectives with your requirements gathering sessions (either the requirements workshop or a use case workshop) are as follows. We recommend that you complete these objectives in this order because this is the most logical and productive arrangement: 1. Define required features. This may be supplied by the stakeholders and should be documented in the vision document. 2. Define and elicit requirements via the requirements meetings. 3. Define and describe the actors during the use case workshops. 4. Define and describe the use cases. 5. Define the basic flow for all defined use cases. 6. Ensure that all features and business requirements trace to requirements defined in a supplementary requirements specification or a use case. If they do not, then something was probably missed and should be revisited. 7. Define alternative flows in the use cases if possible. Some other goals for the facilitator are to get everyone excited about attending the requirements workshop and to supply an agenda and any preparatory documents such as the vision document to all attendees. At the conclusion of the workshop, all notes from the scribes and the facilitator should be compiled and distributed to all meeting participants.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 81
CHAPTER 4 ■ GATHERING REQUIREMENTS
The requirements workshop should produce several types of requirements: Supplemental requirements: These are any requirements that are not visible to a use case. These requirements should be added to the supplemental requirements specification. Use case requirements: These are requirements that belong to a use case, and these requirements will be defined in the use case workshops.
Defining Workflow Requirements We have already discussed what a workflow is, but just to make sure that the concept is firmly cemented in your mind, we will go over it again. A workflow is a business process that is implemented in a CMS. This process comprises three subcomponents; these are roles that are the actors involved in the workflow, tasks that are the steps that the workflow performs, and subprocesses that are the flow of tasks performed by each actor. Also, from the earlier discussion, remember that the actor involved in the workflow can be a human or system actor. Workflow is an amazing thing, and a properly implemented workflow is a thing of beauty. With a workflow, you can control who gets notified and when, when content is deployed, what steps are followed, and what approvals are required—basically the entire content creation life cycle from start to finish. Now that we have reviewed what a workflow is, we can get to the exciting part—defining workflow requirements. Many methodologies exist for defining workflow requirements, but you can benefit from the past mistakes of others and do it right the first time! When defining workflow requirements, the first step is to see what the existing content creation process is. This does not mean you will duplicate it exactly with a CMS workflow; in fact, the opposite is true: you will be looking for ways to streamline and improve the process. However, you must understand what is done today in order to build the best solution of tomorrow. Remember in Chapter 2 how we outlined FiCorp’s existing publishing process? This will be your starting point. Using these processes as a guide, we recommend that you hold a workflow workshop. In the workflow workshop, you should have three separate whiteboards or other large drawing surfaces. On the first drawing surface, you should write the word create, on the second you should write approve, and on the third deploy. Working with the SMEs and project stakeholders, identify which steps in the process go in each drawing pane. For the create pane, ask the following questions: • Who initiates the workflow? Is this the same person each time? • Does workflow ever get automatically created, and if so, how? • Where does the content come from that gets attached to the workflow? Can content be imported? • Who creates the content? What steps do they perform? • How does content get attached to the workflow, and who does this? • Are notifications sent when content is created or imported? Where and to whom are these sent? • At which step in the content creation process does the workflow begin? What should this look like? • Are there any steps that should be performed in parallel with each other? What are those steps or tasks?
81
6110_Ch04_FINAL
82
7/2/06
1:06 PM
Page 82
CHAPTER 4 ■ GATHERING REQUIREMENTS
For the approve pane, ask the following questions: • Who approves content and at which point in the workflow does this happen? • What does this approval look like? • When approving content, the actor should also have the ability to reject content. Does the rejection follow a reverse approval route? If not, how are the rejections routed? • Are notifications sent out when content is approved? How about when content is rejected? To whom are these notifications sent? • Are there specialized approvals required for different content types? What are these? What does this process flow look like? For the deploy pane, ask the following questions: • Once final approval occurs, where is content published? • Is there any post-approval content processing that must occur (converting the text to PDF, for example)? What are these processing requirements? • Should the deployment process initiate any logging functions? This would include when content was deployed as well as the type and identity of the content. • To where should content be deployed? • What notifications are sent when content is deployed? To whom are these sent? • What external systems are notified when content is deployed? What should this notification look like? What format should the notification be in? Now you should begin to create these panes as sequences in a flowchart. Label each task in the flowchart. Use special flowchart symbols to differentiate between actors, tasks, and notifications. Look for ways to streamline any process that you add to the flowchart. We suggest you use a spreadsheet to really map out all the discrete flowchart steps. Using a flowchart, you can also identify what each step is. Some examples of this identification are an external task, a group assignment task, a user assignment task, or a timer task. Chapter 12 will include more information about identifying the task type. Repeat these steps for every workflow you will need in the CMS.
■Note The modus operandi for most teams when deciding notification details is to include too many! Keeping the communication channels open seems like a good idea; however, six months after the CMS is implemented, users will begin screaming that their inboxes are being flooded with email. What this means to those users is that they will stop responding to any CMS notifications. What that means to you is a redesign of workflows or that a request to disable workflow notifications will be coming in short order. Heed our advice; make certain you build only necessary notifications into the workflow. If a particular actor is listed too many times on the notify list, ask that actor about setting up a separate inbox just for their workflow notifications.
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 83
CHAPTER 4 ■ GATHERING REQUIREMENTS
Defining Template Requirements When defining templating requirements, it is helpful to have composites of the finalized output files. To explain this in another way, if you have the final product to look at, then you can more easily define how to perform the data collection for that piece of content. You can then use these composites by deconstructing them into their content components. Within a CMS, you really have two components to worry about: the form or data capture template and the presentation template. You can use these composites to develop requirements for the data capture templates or forms, and you can plug the HTML in the composites directly into the presentation templates. If you are moving a live site into the CMS, then you can look at the HTML used in the current live site. If you are building a completely new site, you should still ask that the business owner or project stakeholder deliver some composites or wireframes of what the finished pages will look like. Once you have the composites or wireframes, you should begin to create content components from them. For example, if an image is displayed in the composite, you ask the following: • Can any other type of nontext asset be displayed in this area of the page? • Where is this asset retrieved from? • How does this asset get inserted into the generated content? If you determine that an image is the only asset that will be used in this area, then you should break the image into its component sections. These should include a Browse button to locate the image, whether it is from an external DAM system, from the local or network drive, or from within the CMS. You will also need alternate text for the image. This will require a field through which to accept this data. Is the image clickable? If so, you will need an input box on the data capture template for the target link. You must do this type of analysis for every component on the composite, including images and other digital assets, paragraphs, navigation, rich media files, links, and so on. Basically, if you see it in the composite, you will need to account for its existence in the data capture template or presentation template. We recommend you organize this information in a Word table. This table should have the following columns: Field Name: This column will contain the field label. If you have a corporate style guide, then this column should contain that element’s name. For example, you could use Title, Keyword, Header1, Header2, Header3, Ad Graphic, and so on. Required: This column will contain information alerting the data capture template implementer whether this field is required. If required, the user will be unable to save the form without populating this field. Replicant: This column will identify whether this field should be a replicant, which is a reusable container that is present on a TeamSite form. In object-oriented terms a replicant is an object that can be instantiated repeatedly, with each instance containing different data values. Additionally, if the field is identified as a replicant, then you should also determine the maximum number of allowable fields. Format: This column will identify whether the field will be text, a callout, inline, a dropdown list, or an image. Field Constraints: This column will contain any constraints associated to the field on the data capture template. Examples could include the number of characters allowed in the field, any special character limitations, whether the field uses the visual format editor, and the number of replicants. You should also include tab order in this field.
83
6110_Ch04_FINAL
84
7/2/06
1:06 PM
Page 84
CHAPTER 4 ■ GATHERING REQUIREMENTS
Field Description: This column will contain a description of the field and any other information that should be documented pertaining to this field, such as cross-field logic implemented via FormAPI. FormAPI (which is part of FormsPublisher) allows you to create data capture forms that respond dynamically to user-initiated events. In your supplemental requirements specification document, you should list the data capture template breakdown and its associated presentation. The presentation will come directly from the composites or mock-ups that were provided to you. On the presentations you should draw callouts that identify which field is being referenced by the field name. This will allow implementers to more easily build these components. If the data capture template has external dependencies, then you should also list them in the table. Keep in mind when gathering templating requirements that you should ask whether specialized metadata fields must be captured on a template-by-template basis. Additionally, you may have requirements to capture global metadata, so look out for this as well. Some examples of global metadata are the keywords, the copyright, and the publish date. Another gotcha to watch out for are fields used by web reporting tools such as Web Side Story or Web Analytics. These tools usually require a specialized tag that records when and how the page was accessed. You want to build a construct for these in the template in case you switch web reporting vendors or the reporting applications require a modification in their tags. By following the advice in this chapter, you will have a methodology for gathering templating requirements. Interwoven’s technical advisors are skilled in this area and can assist you greatly in gathering templating requirements.
Defining External Program Requirements External program requirements in the context of a CMS can take many forms. You must consider external callouts at the template level, external tasks at the workflow level, and external systems that the CMS must communicate with. Keeping all these external programs straight can be a daunting task. The best way to document these types of requirements is to identify which actors are external to the CMS. Once you have identified these actors, identify who is responsible for these external actors. Is your team responsible for them or someone else’s team? Is an external vendor responsible for these actors? Once you decide who is responsible, the next step is to determine what information is needed from the CMS for this program to function properly. In IT there is a time-tested concept for creating a program. That concept is input, process, and output. Although you shouldn’t necessarily be concerned with the process part of an external program (unless your team is responsible for it), you definitely must be concerned with the input and output components. Ask the person or party responsible for the external program what information the program requires. Find out what protocol is used for communication and in what formats the input can be accepted. We recommend always using XML and web services, if possible. Document all information relating to every external program including its interface, its protocol, any return codes, and the team or contacts that support this program. If your team is not responsible for this program, then you must develop an understanding or SLA with this external team. You should establish a communication procedure so that if the interface needs to change, your team is notified about the updates prior to them actually changing it. With the output, many of these same factors apply. You must communicate to these external parties
6110_Ch04_FINAL
7/2/06
1:06 PM
Page 85
CHAPTER 4 ■ GATHERING REQUIREMENTS
what type of output is needed by CMS. We cannot give you specific instructions in this area, but if you are diligent in documenting these requirements, then your interface will be successful. Just remember to input, process, output, and test everything!
Mapping the Selected Vendor’s Product Offerings to CMS Requirements Once you have gathered the requirements, the next step is to map these to a vendor’s products. There is no better ECMS to use for your solution than Interwoven TeamSite and its enterprise content management capabilities. Interwoven technical services are invaluable during this process. Once your listing of requirements is complete, you should go through a mapping exercise to discover whether your chosen CMS vendor can meet all your requirements. You may have even received proposals from several CMS vendors and now must decide which to use. This is a difficult task, and your future with your company could depend on you making the right decision. See Table 4-3 for a way to create a mapping of your requirements to a given vendor’s features. Table 4-3. Requirements Matrix to Vendor’s Offerings
Feature/Requirement
Description
TeamSite 6.7
Feature, Configuration, or Customization
Versions of the content must be maintained.
Supports versioning with number of versions of the files to be kept can be configured. History of all modifications are maintained by the content store.
✔
Feature
All content on the site must be able to be reproduced for historical purposes.
TeamSite supports editions, which are snapshots of the current staging environment at any given point in time.
✔
Feature
Simple-to-use GUI interface.
It has a well-defined GUI.
✔
Feature
File-level access control.
Permissions on files created through the file system interface are determined by file system interface configuration.
✔
Feature
Maintain multiple content areas.
Branches and sub-branches (for different projects or initiatives) that contain individual workareas, staging areas, and editions, which allows for massively parallel development on a single platform.
✔
Feature
Check-in and checkout
TeamSite supports three types of file locking: submit locking, optional write locking, and mandatory write locking.
✔
Feature
Customizable GUI
Menu items can be enabled or disabled as per user role.
✔
Feature
85
6110_Ch04_FINAL
86
7/2/06
1:06 PM
Page 86
CHAPTER 4 ■ GATHERING REQUIREMENTS
This table contains several columns of information; the first column contains the text of your defined feature or requirement. You would get this information from your supplemental requirements specification. The Description column should contain the description of the requirement (if needed) and the way that problem is solved by the technology located in the next column. The third column is where you list the vendor technology. A check mark indicates that the feature or requirement is supported. However, it does not indicate the degree to which it is supported. The next column, Feature, Configuration, or Customization, should identify whether the feature or requirement is an out-of-the-box feature of the given vendor’s product or a configuration or customization of that product. The final column allows you to set a level of difficulty for the customization or configuration of that product to match the documented requirement or feature. You should create a matching sheet for each product evaluation and choose the one that most closely fits your requirements and matches your budgetary constraints.
Summary When documenting CMS requirements, it is important to document everything. This seems like a lot of documentation, doesn’t it? Well, you are correct—it is a lot of documentation, but the added value for doing this documentation is immeasurable. You can use this documentation later if you go through a prioritization effort with the stakeholders. You will do this if you do not have enough implementation time, resources, or budget left to get the complete listing of requirements in the system. You can then group these postponed requirements into a second- or third-phase rollout for later implementation. Make sure you read and understand everything in this chapter, and when you understand it, read it again! Gathering/documenting accurate requirements is the most important part of any development effort. We are not saying that you won’t have any snags along the way, but this chapter will help you navigate these turbulent requirements waters.
6110_Ch05_FINAL
7/2/06
1:07 PM
PART
Page 87
3
■■■
The Elaboration Phase In the Rational Unified Process (RUP), the Elaboration phase is where the majority of difficult engineering work takes place. At the end of this phase, you must decide whether the company should invest more money and expense to continue to the Construction phase. During this phase, you should develop a prototype of the architecture and demo it for project stakeholders. This prototype allows the project team to really lock down the remaining project phases in terms of schedule and duration. This phase involves a deep and exhaustive analysis into the core use cases of the system. Documentation evolves according to lessons learned from the prototype, and you’ll further refine the functional requirements. The Elaboration phase may encompass one to many iteration cycles; with each completed iteration, you derive more detail, and you develop further understanding. The chapters in this part will not give you a complete view of the Elaboration phase, but they will show you many of the activities related to this phase. The Elaboration phase of the project should be one month to two months long for most projects and should accomplish the following goals: Create a validated architecture that is feasible for the organization: This will prove that the proposed architecture is possible to implement. This goal will also provide a means to acquire additional funding and project resources, if needed. Create a detailed work plan project schedule for the Construction phase: At the least, you should create a detailed plan for the first iteration within the Construction phase; ideally, you will create a plan with all the remaining iterations defined. This will be derived from the amount of work that has been done already. By considering the time involved during prototype development, you should more easily be able to calculate
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 88
the time remaining to finish the development. You will not only use any prototypes you have as a calculation factor, but you will also learn from the previous development efforts and project team member experience. Elaborate on the architecture, software, infrastructure, and use case requirements: Have a target goal of completing 80 percent of the use case requirements and the functional requirements (maintained in the supplementary requirements specification). This will produce high-fidelity use cases for the remaining iterations and a more complete supplementary requirements specification. This collection of artifacts should also receive sign-off from all impacted stakeholders. Create a revised risk list and risk assessment document: You will need to document the new risks that you have uncovered up to this point in the process, as well as the anticipated risks. At the conclusion of this phase, you and the project stakeholders will determine whether to move forward to the other phases or to scrap the existing work and look for an alternative method to solve the business problem.
6110_Ch05_FINAL
7/2/06
1:07 PM
CHAPTER
Page 89
5
■■■
Building the Hardware Infrastructure O
nce you have decided on the products you will be using, it is time to start looking at how your infrastructure is going to change. This is a time-consuming task and one that is difficult to get through without getting behind schedule. Most corporations have a lengthy process for buying hardware, and hardware is expensive. Therefore, you should account for this process in your business plan. We won’t tell you there is only one way to build your hardware. What we want to accomplish in this chapter is to make you aware of some of the steps you can take to make your implementation much more effective. Upon completing this chapter, you should be able to strike up a two-way conversation with the people who are responsible for making hardware decisions. If you are that person, then some of the topics in this chapter may be apparent to you, but you should get some insight into some of the questions the development team faces.
Understanding How Release Cycles Affect Hardware Layout Content management is an ongoing process that never stops. It must be planned well, budgeted annually, and updated often to be successful. When you introduce a CMS into a company, you must follow many steps to ensure that the rollout is successful. Many times people put a great deal of planning into how the system will work but little time into how the application will be updated. You should create some plan as to how software updates are released. Certain parts of the application are vital to the operation of the system, and you do not want developers “futzing” with them while your CMS users are working. Certain changes affect major components of the system that will need to be tested before going live. You should handle these changes in the same way you handle any other application with a release cycle. This means you will need more than one TeamSite environment to enable software releases to be promoted in a progressive process, such as from development to test to production.
89
6110_Ch05_FINAL
90
7/2/06
1:07 PM
Page 90
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
■Note You will handle the content changes and the code changes differently. The changes made to content should be made, approved, and deployed from the authoring environment. Content changes do not have to go through the normal code promotion process.
Setting Up the Content Authoring Environment In this section, we’ll show how an authoring environment that is set up properly will make your life so much easier. Determining your hardware configuration is the first step in setting up your software environment (see Figure 5-1).
Preview
App/WWW Development
App/WWW Preview
CMS Authoring
App/WWW Testing
App/WWW Production
Figure 5-1. The interaction between the CMS authoring environment and the different application environments As you can see, Figure 5-1 demonstrates that an authoring environment must support four application environments: production, preview, development, and testing. The following sections discuss these environments, as well as other benefits of building your environments in this manner.
Production We always hear chuckles from application team members when we tell them the CMS team will be updating the content directly in production. You cannot directly update “production” servers. That is why the TeamSite’s production environment is called the authoring environment. You need to be able to update content in a production-like environment to keep system teams from restarting their servers anytime they think it is necessary. When you call them on it, they will say your box is not a production machine. The authoring environment is your TeamSite working environment; content will be updated directly on this server. Your content authors are depending upon this server to be available during business hours, so you should not allow developers to change your CMS application directly on this server. It seems that every team that implements TeamSite seems to want to change application components directly in the content authoring environment. It is important that your application development follows the same procedure for updating code
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 91
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
that other applications do. This is why it is also important to distinguish between what you consider CMS code and what you consider content. Yes, TeamSite is a tool, but you will have a custom application that grows up around it, and those changes need to have their own development environment. The correct mandate to the system team is that content changes will be made directly in the authoring environment, but code will need to be promoted.
Preview Now that we are done preaching, take a closer look at Figure 5-1. You’ll see an environment named preview. The preview environment should match the production TeamSite server as closely as possible to allow your changes to be viewed in a production-like environment before they are deployed to production. The content that needs to be previewed here is content that has not been deployed to the development or testing environment. You could be updating content for a system that cannot be mimicked within these other environments. This is a good idea: you will find ways to use this environment and will be grateful that you took the time to set it up in advance.
■Note Something to keep in your mind while you are reading about your environments is that not every environment needs to be running a top-of-the-line system. Some of these servers can be older machines, which will allow you to see your content in a live state instead of just a virtual state. You will discover many more errors if you can look at the content before going live.
Application Development Environment Application teams are constantly put in the situation of making adjustments to their code to accommodate something that has been discovered at the last moment. This last moment usually takes place when all the work that different teams have developed is integrated. Integration can require much less effort if the content team can make immediate updates as the business owners change their minds or discover content that they overlooked in the original design. You can’t ensure that everything will be correct after the first draft of the requirements. These last-minute changes are common, but unfortunately the changes in the content often do not reach the application team until the eleventh hour. The application should have all the latest changes and be tested with as close to production-like content as possible. This should be the case in each application environment. A lot of companies fail to achieve integration between their content and the application because they cut off the connection between the authoring environment and the applications’ development environments. Once your corporation starts forcing their application teams to start using the CMS to update their application content, the content teams will not have to zip the content manually and move it out to each application environment.
Testing/User Acceptance User acceptance is often one of the largest hurdles for the CMS. Building your authoring environment in this way will speed along user acceptance. A CMS is often thought of as a system that isn’t connected properly. Instead of reducing the workload and reducing your delivery time from concept to production, the content can become even slower to market, with the
91
6110_Ch05_FINAL
92
7/2/06
1:07 PM
Page 92
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
CMS team as the new bottleneck. The reason for this is once the first version of the system is rolled out, everyone thinks they can now leverage the system like any other system. The problem with this approach is that most corporations decide to start small with their initial release. This means the system has been set up for a simple scenario, and not much thought has been put into what is going to make this system successful over time. With a good percentage of applications, a corporation can be successful by building out as they need extra bandwidth because they can just add a server when necessary. If this is your philosophy, you need to also think of the best possible state to be in. You should ask questions such as, what are my more sophisticated needs, and what are problematic areas that could be fixed by the CMS? It pays to know some of these answers in advance because, just like any other initiative, you need to know where you are going, especially if you build the additional connections as Figure 5-1 describes.
Customizing TeamSite Application Development and Configuration The other goal of your new CMS is enhancing the user experience. You will achieve this by customizing TeamSite to your industry and automating tasks that can be completed without human intervention. Just because you are working with a system that handles content, TeamSite itself is an application; therefore, you should handle it in the same way as other applications. The deployment should have at least three basic tiers, as shown in Figure 5-2.
Testing CMS
App/WWW Integration
Development CMS
App/WWW Integration
These servers are for the TeamSite development team’s use only. Both of these instances could reside on the same machine if necessary.
The lab is a good place to play around in. You can install new patches, and try out crazy ideas without stopping development.
Lab CMS
Figure 5-2. A three-tier deployment environment Figure 5-2 demonstrates how you should set up the testing and development environments to ensure proper testing and development. The figure does not depict the authoring environment or production environment, but we have added a “lab box” because this is helpful for experimentation. The lab box is normally isolated from your other systems, which allows you to install patches and perform tests that may bring down the server. The lab box should not cause anyone a second thought if it needs to be rebuilt.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 93
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Typically, the testing environment mimics the production servers’ size and strength, but if you do not plan on performing stress tests, you don’t need to spend all of that money. We do not want to beat you over the head with information that most development teams already know, but it is important to treat this as a true application development environment for your CMS infrastructure code changes. We have often found implementations of TeamSite where it is possible to test the workflows that have been developed only in production. When you are adding new workflows, this is not a problem because no one is using them; however, when updates need to be tested, it is not a good idea to wait until production to test them. Large corporations with international teams have content authors using the system 24 hours a day, so any outage is not acceptable.
Coordinating Small and Large Development Efforts The three-tier layout will allow you to be testing one release, to be developing one release, and to have one release ready to be deployed to production. Realistically, you would maintain only two releases at any given time using the three-tier layout. We are not saying that three can’t be handled effectively, but it requires a well-organized team and a team that communicates effectively. If your corporation decides to set up your environment this way, make sure you employ an effective code-versioning system. Since most implementations have only one development server for their TeamSite initiatives, a common problem is overwriting one another’s changes. A great way to solve this problem is to use TeamSite to help manage your development efforts in conjunction with your current source control tool, as shown in Figure 5-3.
CMS Deployments CMS Authoring
CMS Testing
Source Control
CMS Development
Figure 5-3. Versioning system and TeamSite In Figure 5-3 you can see that the source control services are augmented by using the TeamSite server. This connection between TeamSite and the source control server does not have to be an automated process, but Interwoven has integration points if you are interested in automating this process. Using TeamSite to perform your deployments enables you to utilize the full power of the workflow engine and to add intelligence to your deployments.
93
6110_Ch05_FINAL
94
7/2/06
1:07 PM
Page 94
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
An additional benefit to this system is you can time slice the testing and development environments by deploying a version of the entire site. Time slicing your environment means breaking your testing or development server time into blocks. For instance, you could deploy the entire site for a defect in the morning and redeploy it for a different team for user acceptance testing in the afternoon. This system as described could easily handle an emergency fix; you could test it simply by putting your current testing on hold and deploying the production release to the testing environment. You could then move the newly fixed code to the testing environment and perform any necessary testing before promoting this change to production. This process does not eliminate the need to merge the new fix into any existing development efforts but does allow for testing the fix with code that is in production before it goes live.
Choosing a Platform Your choice of platform will have a direct effect on the hardware you purchase. You have three platform options for your TeamSite implementation: Windows, Unix, and Linux. The hardware infrastructure team will probably choose the platform, but if it has not been decided, then we recommend discussing these three platforms with Interwoven Technical Services to help you identify possible roadblocks for each platform given your current environment. It may be easy to say that Linux will be much cheaper, but you must take into account such issues as having the proper internal resources to handle such a platform. You may not have the expertise to manage a Linux environment in-house, or you may have to buy all new servers to support Linux. In any case, you are not limited to only one platform.
Selecting FiCorp’s Products In this section, you’ll look at the software choices FiCorp has decided to purchase. Search comes with TeamSite, but it has its own requirements. The following list of products will be the focus of the rest of the book, but we will still talk about others where necessary: • TeamSite 6.5 • TeamSite Search • OpenDeploy • DataDeploy • MetaTagger The following sections will review these products and cover how they will look after they are installed. FiCorp will need some new servers, but some of the software will find its way to existing servers. We will cover some of the hardware requirements and ports required for each of the servers. Basically, no right way works for every enterprise. Ten companies in the same industry could choose the same software and have the same requirements but choose to build different systems. Each system will function well for its users, but with that being said, there are some ideal ways to handle things.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 95
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
■Note When setting up your hardware, take in account your growth rate. At first the user acceptance will be low, but as you gain momentum, people will be flooding in to use the system. Make sure you do not make a bad name for yourself by not being able to keep up with the demand. Also, remember to take into account backing up the content store and separating backing stores into multiple stores in order to allow content to be frozen region by region while being backed up.
TeamSite 6.5 and Interwoven Search In contrast with some software packages we have used, the performance of TeamSite over the years has greatly improved. Also, as Java has matured, the TeamSite application has benefited greatly. When determining the sizing of your TeamSite server, you must take into account that TeamSite sits at the center of your CMS services. The server that houses the TeamSite application will need plenty of room for growth. This section will show three tables to determine how to set up FiCorp’s hardware. Table 5-1 shows some of the server configuration recommendations from Interwoven (source: Interwoven TeamSite Installation Guide Unix, Release 6.5). This table has been designed to meet the needs of the TeamSite application only; if you want to install search on the same server, you need to refer to Table 5-3. Table 5-1. System Usage for TeamSite 6.5 Stand-Alone Server
Heavy*
Moderate**
Light***
15 concurrent users
2 CPUs
2 CPUs
1 CPU
100 total licensed users
4GB RAM
2GB RAM
2GB RAM
50 concurrent users
4 CPUs
2 CPUs
2 CPUs
300 total licensed users
4GB RAM
4GB RAM
2GB RAM
150 concurrent users
4 CPUs
4 CPUs
2 CPUs
1000 total licensed users
8GB RAM
8GB RAM
4GB RAM
300 concurrent users
8 CPUs
8 CPUs
4 CPUs
3000 total licensed users
16GB RAM
12GB RAM
8GB RAM
2GB total content
5GB total content
15GB total content
50GB total content *
Numerous Get Latest, Submit, Publish, or Compare operations; ongoing website or directory navigation; frequent data record generation; frequent development of workflows, presentation templates, and other file editing
**
Occasional Get Latest, Submit, Publish, or Compare operations; ongoing website or directory navigation; occasional data record generation or file editing; few development activities
*** Infrequent Get Latest, Submit, Publish, or Compare operations; occasional website or directory navigation; occasional data record generation or file editing; no development activities
95
6110_Ch05_FINAL
96
7/2/06
1:07 PM
Page 96
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
■Note The tables in this chapter should not be used for your final sizing of your servers. These tables, although taken from Interwoven’s documentation, may not be up-to-date. Please refer to the Interwoven documentation for this purpose.
It is easy to see that TeamSite will require a good-sized machine to fulfill a corporation’s needs. When looking into which server chassis to purchase, we recommend buying the largest you can afford. You may not need to fill up the processors and memory, but expansion is inevitable. Now take a look at Table 5-2, which shows you how search performs when installed on a server by itself. This is the base performance given the data type that is being queried (source: Interwoven TeamSite Installation Guide Unix, Release 6.5). Table 5-2. Interwoven’s Search Performance on a Server by Itself Based on Content Type
Documents
Data Records
Extended Attributes
Response time
Less than 5 sec
Less than 3 sec
Less than 5 sec
Less than 3 sec
Less than 5 sec
Less than 3 sec
100 queries per hour
N/A 1GB RAM
1 CPU,
N/A
1 CPU, 1GB RAM
N/A
1 CPU, 1GB RAM
1000 queries per hour
N/A
1 CPU, 1GB RAM
1 CPU, 1GB RAM
2 CPUs, 4GB RAM
1 CPU, 2GB RAM
2 CPUs, 4GB RAM
2000 queries per hour
1 CPU, 2GB RAM
2 CPUs, 4GB RAM
2 CPUs, 4GB RAM
N/A
2 CPUs, 4GB RAM
N/A
As the number of queries grows, search can become rather CPU intensive. Therefore, FiCorp will be using a separate search server; besides, Interwoven suggests that separating these servers is the best solution. We will, however, still show server recommendations for installing search with TeamSite, because you may not be in an environment that allows the sharing of file systems across servers. Table 5-3 shows what search and TeamSite on the same box can do (source: Interwoven TeamSite Installation Guide Unix, Release 6.5). Table 5-3. System Usage for TeamSite 6.5 and Combined Search
Heavy*
Moderate**
Light***
15 concurrent users
4 CPUs
2 CPUs
2 CPU
100 total licensed users
4GB RAM
4GB RAM
2GB RAM
50 concurrent users
4 CPUs
4 CPUs
2 CPUs
300 total licensed users
8GB RAM
4GB RAM
4GB RAM
150 concurrent users
Contact
8 CPUs
4 CPUs
1000 total licensed users
Interwoven
8GB RAM
4GB RAM
2GB total content
5GB total content
15GB total content
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 97
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Heavy*
Moderate**
Light***
300 concurrent users
Contact
Contact
8 CPUs
3000 total licensed users
Interwoven
Interwoven
8GB RAM
50GB total content *
Numerous Get Latest, Submit, Publish, or Compare operations; ongoing website or directory navigation; frequent data record generation; frequent development of workflows, presentation templates, and other file editing
**
Occasional Get Latest, Submit, Publish, or Compare operations; ongoing website or directory navigation; occasional data record generation or file editing; few development activities
***
Infrequent Get Latest, Submit, Publish, or Compare operations; occasional website or directory navigation; occasional data record generation or file editing; no development activities
The problem we have seen with putting both search and TeamSite on the same box is that this does not leave much room for growth on your server. FiCorp claims if it can implement a usable CMS, you will have a high user acceptance rate; therefore, you do not want to limit its growth potential from the start.
OpenDeploy and DataDeploy Before really getting down to what hardware FiCorp will need, we’ll present a high-level diagram of how the network will be laid out. As you can see in Figure 5-4, FiCorp has an additional hop between the TeamSite server and the target server.
Web Server Not Allowed
Zone C
Gateway Zone B
Zone A
TeamSite
Figure 5-4. OpenDeploy will need a gateway machine to get through the firewall between Zone A and Zone B.
97
6110_Ch05_FINAL
98
7/2/06
1:07 PM
Page 98
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
The gateway server allows Zone A and Zone B to communicate with each other. FiCorp will be installing an OpenDeploy base server on the TeamSite box, but you will also need a base server on the gateway server. The gateway server will act as a relay server. The Zone C servers will be sharing an OpenDeploy receiver server. This means one server in Zone C will utilize shared file systems to move files between servers. Zone B already contains several servers, so FiCorp will be using an existing server as the OpenDeploy gateway. Through Zone B you can reach much more of FiCorp’s networks. Note that for each server you put into production, you have to put in one at your disaster recovery center. Sizing an OpenDeploy server is made easier by using Table 5-4, which is the minimal sizing chart from the OpenDeploy release notes (source: Interwoven OpenDeploy Release Guide, Release 6.0.2). Table 5-4. OpenDeploy Sizing Chart
Operating System
CPU Power
Windows 2000 Server (Service Pack 2 or later)
600MHz Intel or compatible (32-bit)
Windows 2003 Server
600MHz Intel or compatible (32-bit)
Solaris 2.8 (Solaris 8), 2.9 (Solaris 9) (32-bit and 64-bit)
400MHz SPARC (32-bit and 64-bit) and 375MHz PowerPC
AIX 5.1, 5.2
IBM RS/6000
Red Hat Linux 8.0, 9.0
400MHz Intel or compatible (32-bit)
Red Hat Enterprise Linux 2.1, 3.0
400MHz Intel or compatible (32-bit)
SuSE Linux 8.1
400MHz Intel or compatible (32-bit)
SuSE Enterprise Linux 8.1
400MHz Intel or compatible (32-bit)
HP-UX 11i (32-bit and 64-bit)
300MHz PA-RISC
Remember, these figures are for the bare minimum. If you have other services running on the machine or have a high number of deployments, you will not want to limit yourself to the bare minimum. Do not forget to perform some projections for the next five years to help you determine your true needs. As you can see from Table 5-4, you have several operating system choices.
■Note You should discuss your sizing requirements with Interwoven Technical Services. You should also discuss some content-processing topics, such as the number of transactions expected per peak hour and the size of your deployments.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 99
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
MetaTagger MetaTagger has several features, although you may not be able to use all the features immediately. FiCorp will be installing MetaTagger on a separate server from TeamSite inside the authoring environment. This may change as FiCorp better understands all that MetaTagger can provide. MetaTagger will not increase the size of the box needed but can be broken up onto separate servers as the need arises.
Looking at the New FiCorp Network A new picture should be starting to form of what the new FiCorp network will look like. FiCorp has a new TeamSite server, a new OpenDeploy server, a new search server, and a new MetaTagger server. Figure 5-5 shows the new authoring environment. Gateway Segment
Mail Server TeamSite
OpenDeploy/ DataDeploy
Search Server
Content/Application Integration MetaTagger Oracle
Authoring Environment
Figure 5-5. The network communication that is necessary You will now take a closer look at each of these servers and how they integrate with each other at this point. Once FiCorp has these servers integrated, it will have a top-notch authoring environment.
99
6110_Ch05_FINAL
100
7/2/06
1:07 PM
Page 100
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
TeamSite The TeamSite server will be busy, as you can see looking at Figure 5-6. Eleven separate services are listening for the TeamSite server. This is one of the reasons some of the other services are on separate servers. When the entire TeamSite server is up and running with the addition of the OpenDeploy services, the server has its hands full. We are concerned with the ability to scale the application, so we have kept this in mind as we have determined this configuration. This hardware configuration will allow for this.
8081
8080
80
OD Admin
Servletd
HTTP
3035
443 JNDI TeamSite
3030 OpenJMS
HTTP
1099 OpenAPI
20014
9173
81
Event Subsystem
Proxy
NFS Search
Gateway Server
Application Server
1080
OD Base Deployments
Oracle Server
Browser Client
HTTPS
NFS
Search Server
Figure 5-6. TeamSite environment
Network You should take a look at which ports need to be opened up to you for this FiCorp environment. This will allow the information security group to know which ports to leave open for use when they harden the box. This section lists all the ports required with a brief description of where they will be used. In Chapter 6, we will describe more about what is going on behind these ports when we cover how to perform the actual installation of the each piece of software. Use Table 5-3 as a guide for this discussion.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 101
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Here are the ports: HTTP 80: Port 80 will be used as the front door to the Interwoven CMS. This allows the users to use a nice, pretty URL to access the system. The system performs all the necessary URL rewrites to help hide all its inner workings. This port will not be used as the primary interface. Port 443 will be used instead. HTTP 81: Port 81 will be used as the content preview server. This port’s service is hidden behind port 80. This content server is referred to as the web server and can be simply an HTTP sever or a servlet container. You can use Tomcat or a full-blown application server if you prefer. HTTPS 443: Port 443 is the default HTTPS port. For secure access to CMS, you can use port 443. This port can be changed if it does not work in your environment. This service performs the same functions as port 80. You will be using port 443 as your front door to accessing the TeamSite server from external locations. Proxy 1080: The iwproxy or proxy port 1080 is the engine for remapping URLs for the server running on port 80 or 443. The hiding that takes place is made possible by this process. This also is used for mapping content virtualizations based on the iw.cfg file. OpenAPI 1099: The OpenAPI 1099 port is a Remote Method Invocation (RMI) registry. TeamSite can use an existing registry server, but Interwoven recommends against other applications using the same registry. You cannot guarantee that other applications will not corrupt the registry. OpenJMS 3030: TeamSite has an event subsystem that allows key events to be monitored. Some of Interwoven’s products use triggers from the event subsystem to communicate. Notifications of this sort are handled through the Java Message Service (JMS) on port 3030. JNDI 3035: The Java directory is located at port 3035. This is how TeamSite keeps track of where everything is located. Since you can change the ports and server locations for a lot of the Interwoven products, TeamSite needs to know where you have put them, and this service is what keeps everyone on the same page. Servletd 8080: Port 8080 is the servlet container where TeamSite is running. The default servlet container is Tomcat. Tomcat can be replaced with either WebSphere or WebLogic. The reason for this is that some enterprises will not allow an application group to run Tomcat because it is not hardened. Tomcat is an open source servlet container, and open source software is not widely accepted at some organizations. We are sure your infrastructure team will be quick to point this out if that is the case at your company. OD admin 8081: The OpenDeploy admin server allows a user to configure and monitor OpenDeploy from one web interface. The admin server also runs in a Tomcat container. If you do not have access to the actual OpenDeploy server to change configurations or look at logs, then this will soon be your new OpenDeploy friend. OD base 9173: The OpenDeploy base server allows content to be deployed. The base server port 9173 is actually another RMI registry. The OpenDeploy base servers and receivers use this port to communicate with each other. OD receiver 20014: The OpenDeploy receiver uses port 20014 to receive deployments from the base servers. A base server needs a receiver on the other side to complete a transaction.
101
6110_Ch05_FINAL
1:07 PM
Page 102
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Hardware The hardware for TeamSite will be one of the determining factors for how scalable the application will be. The configuration not only has to be scalable but also has to be able to handle the disaster-recovery requirement. Take a look at Figure 5-7, and then we can discuss the hardware configuration choices made for TeamSite.
W
AN
Live Production Site
TeamSite (Live)
NAS
AN
TeamSite (Backup) W
102
7/2/06
Disaster Recovery Site NAS
Figure 5-7. TeamSite hardware configuration using NAS as common storage As you can see in Figure 5-7, FiCorp has elected to use network-attached storage (NAS). You will be using an internal small computer system interface (SCSI) array for the software installation and NAS for the content store. You will delegate the backups to the disaster recovery site; this will help reduce network traffic in the production environment. In some cases, enterprises are forced to run lengthy batch processing; if this is something you face, then the NAS solution can help free up some time that would normally be used for backing up the file system. This is a good time to mention how important it is to follow Interwoven’s backup procedure. The content store must be frozen during the nightly backup. It is also not just a suggestion to test your backups. You should plan a disaster recovery test at least once a year. Don’t be surprised when your plan doesn’t work completely; these tests will spotlight the weaknesses.
■Note We are not recommending any specific brand of NAS. If you are planning on using a similar configuration for your installations, you should consult Interwoven for the proper sizing and brand configurations. There are models that allow you to make copies of the main slice, and you can back up from the copy.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 103
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Content/Application Integration Many enterprises have some sort of testing, staging, and preview system. This system allows content and applications to come together. Many of these systems are inadequate because the content and the applications are not in sync with production. When a large promotion is to take place, the application is tested against the old content. The new content is tested against old applications or not at all. What you are trying to do is put a system together that allows you to test both the new content and the new applications together. Once you have completed the testing within this environment, you can be assured that the testing is as good as it can be.
Network The application server will be home to several services, as shown in Figure 5-8.
80 HTTP 443 HTTPS Application Server
Browser Client
8080 Servlet Container
9173
20014 OD Receiver
SQL
Oracle Server
Deployments NFS
TeamSite Server
Figure 5-8. Application server environment network communication These are the ports: HTTP 80: Port 80 will be used to serve static content to internal corporate users. Corporations do not always use this port, but it helps eliminate some of the extra bandwidth of encrypting your content if it is not needed. HTTPS 443: Port 443 will serve static content to external users and will employ Secure Sockets Layer (SSL) for encryption. FiCorp will be using static HTML pages over this connection, and this is the standard port on which HTTPS is normally served.
103
6110_Ch05_FINAL
104
7/2/06
1:07 PM
Page 104
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Servlet container 8080: The servlet container is where you will install your Java applications. This container will allow you to perform integration testing with the application groups and the content teams. This level of testing is often not available, and problems with content/ application integration are unfortunately usually discovered in production for the first time. OD receiver 9173: This port is the RMI registry that the OpenDeploy base and receiver installations use to communicate. This is the port that enables OpenDeploy’s initial communications. OD receiver 20014: This port is where the actual work is performed during deployments using OpenDeploy. This port can actually be a set of ports to allow multiple deployments at once. NFS 2049: This port is used for the Network File System (NFS). This is how Unix shares its file systems across a network. The file system is mounted just like a local file system and can be used as a normal mount point. The NFS on the application server will allow the applications to pull content from the TeamSite content store. This means the content does not have to be physically moved to the application server, unless you have a special need to do so.
Oracle Oracle will be utilized via Interwoven’s DataDeploy to store metadata, structured content, and event data. The metadata will be managed in Oracle to allow external applications better access to metadata, such as a customer-facing search engine. Structured content updates allow users to enter data into standard Interwoven forms, and DataDeploy will then store this content in the database. Once these content files have been set up for updating through TeamSite, changes to static content can become something that happens on a regular basis. This allows application teams to update static content that their applications utilize without having to change or redeploy their code. The TeamSite event data can be stored in a database and track such tasks as creating workareas, deleting files, or updating files. This data is good for gaining metrics and maintaining accountability. Also, this Oracle instance will store miscellaneous data for other customizations performed to aid the user of TeamSite. Some implementation teams make great efforts to completely automate their users’ experiences, and this usually takes a great deal of database interaction.
Network The Oracle server is home to a limited number of services for the implementation, as shown in Figure 5-8. Here are the ports: SQL 2483: This port is used for unsecured connections to the Oracle database. The TeamSite server will utilize this port. TeamSite utilizes DataDeploy to update the Oracle database with extended attributes and utilizes event system logging. The TeamSite server also utilizes the ReportCenter module to log both system- and user-driven events such as modifying and
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 105
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
deleting workareas and branches. After this event information is added to the database, queries can be executed against it. This will result in giving the IT staff and your business owners more insight into how the content and the CMS are used. SQL 2484: This port is used for secure connections to the Oracle database. Applications that are being hosted outside the firewall may need to use a secure connection to the database.
SQL
Oracle Server
Other Applications
Oracle Listener 2483
2484
SQL SSL
DAS/ Event Subsystem
TeamSite Server
Figure 5-9. The Oracle server’s network communication diagram
Search Server The search server will consist of two software pieces: the index service and the search service. These services have been developed separately so that it is easier to scale the system. This allows you to separate these functions over different machines if necessary. This search is designed to help your users find content inside TeamSite without having to go and search for it through the directory structure. This will eliminate some of the need to know exactly where every piece of content is stored.
105
6110_Ch05_FINAL
106
7/2/06
1:07 PM
Page 106
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Network Figure 5-10 shows how the services will be installed.
Callback Index Agents Agent Processes Agent Processes
6716
6717 Index Server
6715
Interwoven Search Server
Search Agents Agent Processes Agent Processes
6721 Search Server
Callback
6722
CLT/TS Interface
6720 NFS
Search
TeamSite Server
Figure 5-10. Search server network communication These are the ports: Index server 6715: The index server port is used to start the index process for specific content. The index server will be run against the entire site initially, and when individual files get modified, the index server will need to be run against that file to update the indexes for that file. This is the port that will be called to initiate this process. Index agent main 6716: The index service uses this port to talk to the indexing agents. Each time an index request is initiated, the index service directs an index agent to perform the indexing. Each index agent can handle only one document at a time, so multiple agents will be running. The number of agents that can run is configurable, so you can change the number of agents as capacity increases. Index agent callback 6717: The indexing agents use this port to notify the index service that it has completed its indexing of the last-requested file.
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 107
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Search server 6721: This port is used to listen for search requests. Search requests, as with index requests, can come from the TeamSite interface or from an Interwoven commandline tool. The search service also has its own agents. A different agent is used for each request, and the number of agents that are allowed to be active is also configurable. Search agent main 6722: The search service uses this port to talk to the indexing agents. Each time a search request is initiated, the search service directs a search agent to perform the actual search. Each search agent can handle only one document at a time, so you can configure how many search agents you want to run at any given time. Search agent callback 6723: A search agent uses this port to notify the search service that it has completed its request.
Mail Server We have included the mail server here so that it is not left out of the physical architecture plan. Multiple servers will need access to the mail server. We have included three that commonly need access to the mail server for sending notifications.
Network Figure 5-11 shows how the mail server will look.
OpenDeploy Server Mail Server
Sending Mail
SMTP Sending Mail
25
Sending Mail Application Server
Figure 5-11. Mail server network connectivity
TeamSite Server
107
6110_Ch05_FINAL
108
7/2/06
1:07 PM
Page 108
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
SMTP port 25 port is used for connecting to the mail daemon running on your mail server. This is the commonly used port, but your organization may be using a different one. This is assuming you use Simple Mail Transfer Protocol (SMTP) and not Internet Message Access Protocol (IMAP) as your mail protocol.
MetaTagger Server You will need to open connectivity to allow both TeamSite and the users to connect to the MetaTagger server. Multiple services are running on this server to achieve FiCorp’s tagging needs. You will be using a custom taxonomy to help reduce the amount of work the authors will have to perform when capturing metadata for each page. This will allow MetaTagger to grab important pieces of data and store them automatically.
Network Four pieces will make up the MetaTagger services, as shown in Figure 5-12.
Configuration Manager
Web Browser
9080 HTTPS
MetaTagger Server HTTPS
Admin MetaTagger
9090
Studio
9095
9096
(CLT) Build (CLT)
CLT HTTPS
Gen Metadata (CLT)
Web Browser
Figure 5-12. MetaTagger server network connectivity
MT Studio Java Client
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 109
CHAPTER 5 ■ BUILDING THE HARDWARE INFRASTRUCTURE
Here are the ports: HTTPS 9080: This port allows you to configure MetaTagger functions such as logging through a web browser. HTTPS 9090: The administration server starts and monitors all the MetaTagger instances and the Studio instances. The admin web interface is served through this port. HTTPS 9095: The MetaTagger’s first instance monitors this port for communication, indicating there has been a request from a web browser or an Interwoven command-line tool. Java 9096: The Studio server is responsible for helping create custom taxonomies and runs on this port.
Summary One of the important ideas to take away from this chapter is to remember to consider your software life cycle when building your hardware. It is also important to not isolate your TeamSite server from your application environments completely because you will want to create content-aware applications at some point. In addition, nothing is wrong with talking to Interwoven Technical Services to get some help with your initial system design; in fact, they could save you some headaches. The most important idea in this chapter is to make sure you do not speed through your physical design because you need to consider a lot of port and connectivity issues when installing the CMS in your environment. This chapter should be helpful when working with your hardware support groups.
109
6110_Ch05_FINAL
7/2/06
1:07 PM
Page 110
6110_CH06_FINAL
7/2/06
1:08 PM
CHAPTER
Page 111
6
■■■
Installing the CMS N
ow that you have identified the requirements, stakeholders, and software to use for your CMS installation, you can start installing the software. You should not take the installation of your CMS lightly, however. This installation will involve numerous teams across your organization and should be properly planned before any software is installed. Proper planning can resolve many potential issues before they come to fruition. With CMSs, you should always strive to be proactive rather than reactive. This is not always possible, but it is something to strive for. We recommend sending two people from your CMS development team to a TeamSite system administration class for Unix, Windows, or Linux, depending on the version you want to install. You can find these classes listed on the Interwoven website, and they will be invaluable during this critical process. By taking the training offered by Interwoven, you will be prepared to install the Interwoven software. The administration guides for each Interwoven product cover the installation process; however, the class will help you avoid being surprised by any small issues that undoubtedly will arise. This chapter will cover many topics that no training class can provide. These are realworld lessons based on our experience with CMS implementations. This chapter covers creating an installation plan and assembling and organizing the proper stakeholders for a CMS implementation, as well as many other subjects that will greatly assist you in a CMS implementation.
Creating an Installation Plan A critical piece of the installation involves communicating to all impacted systems support or infrastructure support teams. In other words, you should have a clearly defined installation plan before starting to install. The plan should contain the following sections: Dependent support teams: This section of the document should identify who will be involved in installing the selected software. We recommend detailing a “30,000-foot” view of the installation procedure and discussing this with the relevant impacted parties in an installation kick-off meeting. Try to identify as many impacted parties as possible for the meeting invitation list. You may invite teams that do not end up being involved in the process, but in this case the more teams invited to the meeting, the better. Even if the correct teams are not involved in this meeting, someone in the meeting will know who should be involved. The following questions will help you determine who you should include in the meeting, and you should also refer to the “Assembling and Organizing the
111
6110_CH06_FINAL
112
7/2/06
1:08 PM
Page 112
CHAPTER 6 ■ INSTALLING THE CMS
Installation Team” to further refine your list. If necessary, reschedule this meeting accordingly to ensure that all involved dependent support teams are represented. Questions to consider when compiling the list of invitees include the following: • Who is responsible for purchasing hardware at my company? • Who is responsible for setting up the hardware after it is purchased? • Who is responsible for assigning IP addresses, and who is responsible for setting up network connections at my company? • Which operating system will be installed? • Who is responsible for installing the operating system for new hardware? • Who performs administration activities on the chosen operating system? Contact points, including secondary and tertiary escalation contacts: Once the impacted teams are on board, establish the primary contact points within each team. This may or may not be the person who will be responsible for performing one or more of the installation steps. Additionally, you should identify secondary and tertiary escalation points or contacts. If contact A is unavailable, then you can use contact B. If contact B is unavailable, then you can call upon contact C. With each identified contact, you will also need to identify their associated pager, cell phone, work phone, and home phone numbers. The home phone number is necessary because some teams may not have an on-call person. Additionally, you may need to escalate support issues to a member of management if the identified installation person is not responsive. We have also experienced numerous instances of pagers becoming inoperable because of the pager carrier being out of range or the batteries being dead. You may also need instant messaging IDs as well as email addresses. Role descriptions: This section should identify exactly what each impacted team’s sphere of responsibility is. Be as descriptive as possible. Later when receiving sign-off for the installation plan, you can have a legally binding contract with these groups. The sign-off document itself will serve as the contract. Installation steps defined by role: Arrange the installation steps in a table, with the step number in the first column and the role name in the second column. The installation steps should be as detailed as possible. If you have completed a dry run installation on a sandbox or lab environment, you should have each step completely documented down to the minutia. Timing of installation steps: The timing of each individual step is critical in the installation plan; this column should be the third column in your table. Identify the timing of each step, as well as any dependencies between those steps. Keep time zones in mind when completing this section of the installation plan. Usually, tracking time in 15-minute increments is sufficient. Quality approval or testing milestones: How will you verify that the individual installation steps are successful? How will you verify that the entire installation is a success? You will do this by identifying test steps based on each installation step. Many times this is not possible, but when it is, you should include testing milestones in your installation steps.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 113
CHAPTER 6 ■ INSTALLING THE CMS
Overall installation timeline: You need to plan the overall installation timeline at a high level. The plan needs to map the entire installation process from the hardware acquisition to the final startup of the Interwoven software that was installed. Since a different team could end up handling each portion of the installation, this section helps determine when you will need resources from each group. This section enables you to schedule your resources in advance, thus eliminating any delays because of a lack of proper resource planning. This section of the installation plan should also include any specific dates that each piece of the installation must be completed by to keep the entire installation process on schedule. Sign-off section: The sign-off section includes actual signatures from each team that will participate in the installation. We recommend the signatures carry some authoritative weight. In other words, a manager or higher should be the one signing the installation plan. The manager may receive recommendations from subordinates as to whether they should sign the document, so make certain their subordinates are fully informed and excited about the installation. By having someone with authority sign the document, you can be certain that people will be cooperative when needed during the installation.
Assembling and Organizing the Installation Team The following team names may be different in your organization, but the teams that match the following descriptions should be involved in the installation. Read these descriptions carefully, and try to match them to the appropriate teams in your organization. Network infrastructure representation: The network infrastructure team is typically responsible for all network-related setup activities in your organization. These activities usually include stringing network cabling, setting up routers and switches, allocating network access points, and assigning TCP/IP addresses. This team will not particularly care about the application being installed, but they should be aware of the number of IP addresses needed. Unix administrators/Windows administrators: The administration group will be responsible for setting up users, creating home directories, and doing many of the installation steps of the CMS. These people may be responsible for adding users to groups and performing system functions requiring root or administrator access. Production support—monitoring: This group should be involved so they understand how the CMS is installed and what services need to be monitored. At the least, monitor the CMS on port 80 and port 81. Ports 80 and 81 are used for the Hypertext Transfer Protocol (HTTP) traffic, and by monitoring these ports, you can ensure that the user interfaces are available. Make certain to give them a test plan for the manual verification of the CMS. This test plan should be executed by a monitoring script or manually depending on the monitoring used in your organization. This team should also know about the TeamSite Service Monitor, which can monitor only iwserver; it runs a service called the Watchdog that uses a set of associated tools and scripts to monitor the TeamSite server, detect process and power failures, log failure events, and optionally take corrective action. This monitoring should be turned on in your installation. You can configure the TeamSite Service Monitor in several ways; for example, you can configure the Service Monitor to automatically shut down the TeamSite server in the event of a process or power failure,
113
6110_CH06_FINAL
114
7/2/06
1:08 PM
Page 114
CHAPTER 6 ■ INSTALLING THE CMS
perform different actions depending on whether the failure was power or process related, and execute a specified Perl script after the detecting the failure. You can find more information about the TeamSite Service Monitor in the administration guide for your respective version of TeamSite. This team will also be responsible for routing trouble tickets to the appropriate person. Hardware acquisition team/capacity planning team: This team must understand the hardware requirements for the CMS. This team will be responsible for setting up the purchased hardware in the allocated rack space. You can find hardware requirements in the installation program of your specific Interwoven version. CMS application team: We will presume that this is your team, which will be responsible for designing and developing the CMS components such as new templates and workflows. This team will also be periodically engaged in consulting activities for the other impacted teams. These team members should be known as the SMEs for all things CMS. Although the team members may not have all the answers all of the time, they should know the CMS’s capabilities. They will also be engaged each time a new site or site region is added to the CMS. Additionally, this team should be the third-tier support for any CMSrelated trouble tickets. TeamSite administration team: This team may perform administration activities for the TeamSite system. This team may also be responsible for performing administration of many other applications, so it is important these team members understand how the TeamSite system works. This includes knowing all the important configuration files that TeamSite uses, understanding how to use the command-line tools (CLTs) provided with TeamSite, and knowing what rights are needed to perform certain administration activities in the TeamSite system. If your organization has a TeamSite administration team, then you will probably have to route requests to this team any time you have to perform global activities in the TeamSite system. These activities could include stopping, starting, or restarting the TeamSite server and changing systemwide configuration files.
Training the Installation Team For a CMS implementation, training is critical. We have seen many companies skip this incredibly important step with unfortunate circumstances. When we are talking about training, the old adage of pay now (for training in this example) or pay later (with poor understanding and bad implementation decisions) holds true. Several types of world-class Interwoven training classes are available, including instructor-led, web-based, and onsite training classes. Prior to installation, we recommend you enroll several members of the installation team in a TeamSite system administration class. You should at least enroll two members from the CMS application team and two members from the TeamSite administration team. Choose the appropriate class depending on your operating system and on the version of TeamSite you will be installing.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 115
CHAPTER 6 ■ INSTALLING THE CMS
TEAMSITE ADMINISTRATION CLASSES The TeamSite administration class is a platform-dependent five-day course for TeamSite system administrators and covers the basic features, installation, configuration, customization, and routine maintenance of the TeamSite server. This class also discusses OpenDeploy’s architecture and capabilities as well as the installation and initial configuration for OpenDeploy. During the course, students will learn how to use the OpenDeploy user interface, including deployment configuration creation, security and encryption, complex deployment situations, scripting deployments, and OpenDeploy integration with the powerful TeamSite server. Recommended prerequisites for attending the class include previous system administration experience, with some basic networking skills, and application and web server installation experience. Knowledge of user account maintenance, file system administration, permission management, and basic XML editing skills are suggested. You can find more information at http://inter.viewcentral.com/events/cust/ catalog.aspx?cid=interwoven&pid=1. Alternatively, simply go to www.interwoven.com, and select Education for more information.
Considering TeamSite Installation Issues Before you begin the TeamSite installation, you should already have completed certain tasks. The following are some of those tasks and the items you will need to help speed your TeamSite installation: • You need to create the partition for your content store; follow the sizing recommendations in the Interwoven documentation for this task. • You need to obtain the Interwoven packages you will be installing. Make sure you also obtain any relevant patches. (You can download the files you need from the Interwoven support site.) • Make sure you have a list of port numbers you want to use if you are not using the default port numbers. • You need the customer-installed preview server that you will be using to view your content changes. • You need to have root or administrator access to the TeamSite host machine; this level is necessary for the installation steps. • If you are not going be using the default setup for the Interwoven event subsystem, you need to have a database set up and the user information for the database at hand. • You need the TeamSite installation guide for the appropriate operating system where TeamSite will be installed. • You need a valid TeamSite license key.
115
6110_CH06_FINAL
116
7/2/06
1:08 PM
Page 116
CHAPTER 6 ■ INSTALLING THE CMS
■Caution If you are using Solaris, when you perform an ifconfig, it drops leading zeros. When you enter your Media Access Control (MAC) address, make sure you add the leading zeros. If you do not do this, the key will not be valid. When you get the key, you need to make sure you include the email address part as well in the iw.cfg file. If you are having problems with TeamSite starting, make sure you check the tslicinfo file inside the install directory under the TeamSite home directory.
The following sections will help you understand the installation process. We recommend that you have someone take detailed notes during the installation process, documenting any unanswered questions that you may have and documenting the answers after consulting with Interwoven technical advisors. If you are performing the installation through the command line, make sure you set your terminal to record the session. In most installations we have participated in, this information is invaluable when you install other TeamSite servers.
Considering Additional Server Installation Issues TeamSite installs two server types: an application server and a web server. By default TeamSite uses an Apache Tomcat server as its application server, which is installed automatically during the TeamSite installation. The web server that is installed by TeamSite is Apache. This Apache web server will help display the TeamSite graphical user interface (GUI). This web server is also used when virtualizing pages that occur during page preview at content creation or content approval time. Application server: Tomcat will be home to the TeamSite web applications such as the user interface ContentCenter and the event subsystem. The server is installed by default on port 8080. If you do not want to use Tomcat for your application server, then you have two additional options: IBM WebSphere 5.1 and BEA WebLogic 8.1 Service Pack 2. If you decide to use one of these servers, you must completely install TeamSite with Tomcat and then reconfigure TeamSite to use one of the other application servers.
■Note If you decide to use one of these other application servers, you will need to purchase it separately because TeamSite ships only with Tomcat at this time.
Web server: The Apache web server is installed by default on port 80 for HTTP traffic and port 443 for HTTP Secure (HTTPS) traffic. Your users will be able to browse to the TeamSite server without having to remember any special ports. The server is set up so that you have a choice as to whether you want to use Secure Sockets Layer (SSL) encryption or plain HTTP. The obvious benefits of this are that content authors can contribute content and approvers can approve content from anywhere in the world, provided they have an Internet connection.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 117
CHAPTER 6 ■ INSTALLING THE CMS
Determining the Content Preview Server You will need to determine the platform of the preview server and what software will be installed on it. This server should be installed before you start installing TeamSite. The preview server is where you preview the content changes made within TeamSite before they are deployed to the web server. For static content, the preview can take place on a standard web server such as Apache, but you may choose to use an application server such as Tomcat if you will be previewing JSP pages; alternatively, you could use Internet Information Services (IIS) if you want to preview ASP pages. You may also choose to have additional servers for previewing other file types as necessary. You can configure the TeamSite proxy to handle this advanced-level preview routing.
SOLVING NFS LIMITATIONS TeamSite groups are marvelous. If you have had to perform user maintenance on TeamSite, the benefits of this new security technique are apparent. These groups allow the application to authorize users based on its own security without depending on the operating system. The operating system still authenticates the user, so a user will still have an account on the server, but TeamSite groups can handle the security within the content store. These groups function much like Unix groups. When you set up a new user for TeamSite, you do not request the Unix user groups to be set up; all you will need to do is have the user account set up with the primary Unix group. Every user will use this same Unix group. Then you can assign the user privileges based on the Interwoven group to which the content area has been shared. This allows the team that is responsible for setting up users to not have to go through a painful series of requests to get their users set up properly. Since the application team can update the Interwoven groups, this enables the application team to rely on Interwoven groups to assign workflow tasks. Although you can assign workflow tasks to Unix groups, the application team cannot easily manage Unix groups. Many implementations end up using a series of singleuser assignments to make sure each user who should be a part of the task is assigned. If a user is in an Interwoven group, the application can easily remove that one user without affecting the workflow. This should make more sense to you after you have learned more about workflows; Chapter 12 and Chapter 17 cover workflows in more detail. You can fix the 16 groups limitation by using Interwoven groups because there are no limits to the number of groups that users can be assigned to if you use TeamSite groups.
Setting Up Repositories The Interwoven repository (content store) is where the TeamSite server stores its content. The content store will contain many small files, so it is recommended that the partitions that make up your backing store be formatted with smaller disk sectors. If you leave the sector size at the default section, your partition may not be used efficiently. If you are not housing the content store on some sort of local area network (LAN)/storage area network (SAN), then it should be on a partition that is not on the same hard drive as the operating system. This will help improve performance. With the Unix and Linux operating systems, the content store is built on NFS drivers and therefore shares the benefits associated with NFS. On the other hand, the content store is also bound to some of the restrictions of the NFS. Since NFS uses Unix groups for security, TeamSite has experienced problems with users who belong to more than 16 groups. When a user has 16
117
6110_CH06_FINAL
118
7/2/06
1:08 PM
Page 118
CHAPTER 6 ■ INSTALLING THE CMS
groups, the NFS security model will not recognize that a user is a member of any groups past the first 16. Interwoven has instituted a system that allows users to swap their secondary groups for their primary groups to help alleviate this problem. This feature is not on by default, but you can activate it within the iw.cfg file. The iw.cfg file is located within the /etc directory for Unix type systems. Interwoven has also devised a way to create internal Interwoven groups for security, and you can use these groups to alleviate the 16 Unix group limitation. See the “Solving NFS Limitations” sidebar.
Determining TeamSite Command Responsibilities Sometimes you will need to bring TeamSite down for various reasons. It may be to perform patches on the operating system, or it may be because some other customization to your CMS requires a restart. Other times you will need to perform maintenance such as cleaning up the content store or installing TeamSite service packs. Therefore, you need to determine who is responsible for performing these functions. The ownership of these services may vary depending upon which environment you are using. The application team may have the permissions to restart TeamSite in the development environment, and you may have a production support team that is responsible for performing these functions. It is important to know who owns the responsibility to make sure you have consistent procedures for these functions. The following are a few key commands that you should assign responsibility to during this exercise: Stopping/starting TeamSite: The user who stops and starts TeamSite must have root access. When restarting TeamSite, the mount points are created as TeamSite is started, and this action must be run as your superuser. Rebuilding the TeamSite interface: When you change the ContentCenter interface, you will need to rebuild the interface. The changes may include functionality such as adding custom menu items, changing the layout of the interface, or adding custom workflow variables to be displayed in the interface. The make_toolkit.ipl command is located under the /bin directory in your home TeamSite directory and must also be run as your superuser. Shrinking the backing system: The backing store periodically needs to be compacted to eliminate any duplicate files within the backing store. The iwfsshrink command is located under the /bin directory in your home TeamSite directory and does not require superuser permission, but it should be run at least once a month.
Installing TeamSite The TeamSite installation is very involved, and we recommend you read the installation guides for each product. The installation varies depending on the upgrade path you have chosen, depending on whether you will start a fresh install, and depending on the platform on which you will be installing TeamSite; therefore, too many factors vary to completely cover all the installation steps for the product. That is why the following sections are meant to supplement the Interwoven installation manual. Although the following sections will not completely describe every installation step, they should give you a good understanding of the scope of the
6110_CH06_FINAL
7/2/06
1:08 PM
Page 119
CHAPTER 6 ■ INSTALLING THE CMS
installation. In the following sections, we will discuss one of the most important configuration files in the TeamSite system, the iw.cfg file. When modifying the iw.cfg file, most changes do not take effect until after restarting the TeamSite content server. After restarting TeamSite, make sure you perform some testing or check the logs to verify that the server started properly. In the next sections, we’ll discuss the main configuration sections in the iw.cfg file. The configuration sections that are not discussed can be left as the default settings or are specific to a product installation.
main This section affects the main content store and the rights pertaining to that store. This section contains the following configuration setting: editor_publish: The editor_publish flag determines whether a user logged in as an editor is allowed to publish an edition. The default value is yes, but if you change this option to no, then the editor is not allowed to create editions. If this option is set to no, then the New Edition link will not appear in ContentCenter Professional for editors.
iwcgi In Windows-based installations, the iwcgi section of iw.cfg will list the domains available from the Interwoven login screen’s domain selection drop-down list. For example, a setting of domain_list=Prod1,Prod2,Corpnet will display the values of Prod1, Prod2, and Corpnet in the domain selection drop-down list from the login screen.
■Tip If you choose to not specify any domains with this setting, the Interwoven server will attempt to automatically detect them for you.
This section contains the following configuration setting: launchpad_hostname: The launchpad_hostname value determines the host name where a user would download LaunchPad if they do not already have LaunchPad installed on their client machine. LaunchPad is a client-side Java application that allows uploading and downloading from TeamSite. By default, the host name is set to the server on which you installed TeamSite.
event_subsystem The event_subsystem section is used for capturing events within TeamSite such as creating branches, editing content, or starting a workflow. Using ReportCenter, you can create ad hoc reports from this data. This section contains the following configuration setting: ew_enable: The ew_enable flag activates the event subsystem if the value is true. If you do not want the event subsystem enabled, you should leave the value of false.
119
6110_CH06_FINAL
120
7/2/06
1:08 PM
Page 120
CHAPTER 6 ■ INSTALLING THE CMS
teamsite_servlet_ui The teamsite_servlet_ui section is a Java application that gives a user access to the TeamSite content server. This section contains the following configuration settings: servlet_host: The servlet_host value sets the host where the TeamSite UI will be running. The default value for servlet_host is localhost. servlet_port: The servlet_port value is the port to which the TeamSite UI will be listening. The default port for listening is 8080.
teamsite_templating TeamSite templating is another name for TeamSite forms. The templating encompasses the presentation, the data record, and the capture form for Interwoven. The teamsite_templating section contains the following configuration setting: data_root: This specifies where the template data is stored for TeamSite forms. The default value for this is templatedata.
iwwebd The iwwebd section is a web instance running as part of the TeamSite UI. Its job is serving the front end for all HTML content. This section contains the following configuration settings: default_protocol: The default_protocol value determines whether TeamSite will be generating HTTP or HTTPS on URLs. The default value is http. http_port: The http_port value specifies where the web server will be listening for connections. The default value is 80. https_port: The https_port value specifies where the web server will be listening for encrypted requests. The default value is 443. Host: The host value specifies the address of your host machine. This value must be an IP address if you do not have DNS enabled on your host server. The default value is the primary IP address of the machine on which you installed TeamSite.
iwproxy The iwproxy service runs and provides the internal HTTP and HTTPS routing for TeamSite. This section contains the following configuration settings: iwproxy_host: The iwproxy_host value specifies the host name on which the proxy server will be running. The default value is localhost. customer_webserver_host: The customer_webserver_host value specifies the host name or address of the machine that will be serving your preview content. The default value is the primary IP address of the machine on which you installed TeamSite.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 121
CHAPTER 6 ■ INSTALLING THE CMS
iwproxy_port: The iwproxy_port specifies is the port where the proxy sever will be listening. The default value is 1080. customer_webserver_port: The customer_webserver_port value specifies the port where the customer preview server will be listening. The default value is 81.
[iwproxy_fullproxy_redirect] The iwproxy_fullproxy_redirect value points fully qualified URLs that need to be mapped back to workareas inside the TeamSite content store.
■Caution This option can create a security hole back into the TeamSite content store, so you should be careful to implement the proper security so you do not open your system to a security risk.
[iwproxy_remap] The iwproxy_remap section is where you can map content branches to a configuration section. The configuration section named test_intranet_site_branch matches the section [test_intranet_site_branch] within the iw.cfg file. For example, take a look at the following: test_intranet_site_branch=/main/testing One section would look like this: [test_intranet_site_branch] This is where your proxy remaps for the specified branch would be placed in the configuration file.
■Note The iwproxy_remap section works only for the default content store. If you are planning to use this section extensively, make sure first that this is required. If it is absolutely necessary to use the remap section, then you will not want to use additional content stores for the sections that require this mapping.
This section contains the following configuration setting: global_default_map: The global_default_map value sets the default document root within your workarea. The default value is the root of each workarea.
global_default_map The global_default_map section specifies where the rules are defined for the global_default_map category that is defined in the iwproxy_remap section.
121
6110_CH06_FINAL
122
7/2/06
1:08 PM
Page 122
CHAPTER 6 ■ INSTALLING THE CMS
iwproxy_preconnect_remap The iwproxy_preconnect_remap section allows TeamSite to intercept requests for a portion of a workarea and then display a portion of another workarea in its place. Let’s look at an example to see how this works. Let’s say you are in the /default/main/testing/sports branch testing some sports-related content. The sports branch consists only of the sports content. When you browse the branch under a directory structure that is not under the sports branch, you may want the content from a staging area on a parent branch to be displayed. You could do this by using the preconnect_remap section.
iwproxy_preconnect_redirect The iwproxy_reconnect_redirect section allows TeamSite to intercept requests for content under your branch and actually change the branch you are in to the appropriate branch where the content you want to edit resides.
[iwproxy_failover_remap] This section allows TeamSite to redirect the user to another area if the page that is requested is not found. You can determine these new locations based on the area the page was requested from.
iwproxy_hostheader_remap The iwproxy_hostheader_remap section allows TeamSite to remap content in a certain area within TeamSite to a different domain. Usually the URL would reflect the TeamSite host machine that the content was requested from, but with this remap TeamSite will return the request with the remapped host.
iwproxy_smartcontextedit_allowed Interwoven has a smart content menu that appears over a page this is being previewed, and this section allows it to be turned on based on the path of the file within the content store.
iwproxy_access_control_enabled iwproxy by default allows anyone to go to any location without checking to see whether the user has access. This section allows you to specify areas within TeamSite that you want iwproxy to refuse access to, determined by the user’s access rights.
iwproxy_external_remap The iwproxy_external_remap section allows TeamSite to actually go out to an external server via a URL and grab content such as includes from other companies.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 123
CHAPTER 6 ■ INSTALLING THE CMS
iwserver iwserver is the actual TeamSite content server instance configuration values. It contains the following configuration settings: server_local: The server_local value sets the TeamSite server to local. The default is determined by your system settings upon setup. cachesize: The cachesize value indirectly determines how much memory the TeamSite server uses and should be based on the number of files within your largest branch. The value is the number of objects to store in memory, and each object is estimated at 1KB of memory. The cachesize value should be the number of files and directories in your largest branch times three. For example, if the largest branch contains 20,000 files and directories, then your cachesize setting should be 60,000. rpc_threadcount: The rpc_threadcount value determines the number or remote procedure call (RPC) connections TeamSite can accept before it begins serializing the requests. The default value is 64 and should not be changed. fs_threadcount: The fs_threadcount value determines how many threads will be accessing the Interwoven file system at once and should be set to twice the number of central processing units (CPUs) you have installed on your system. This number should not be set to greater than 8; by default, this is set to twice the number of CPUs in your system. Thruputmonitoring: The Thruputmonitoring value determines whether the throughput monitors are activated. When we refer to thruput, we are referring to the actual throughput setting. You can use throughput monitors in conjunction with the iwstat CLT to monitor system status and performance. When activated, the throughput monitor starts recording statistics according to each of the monitor values. To view these statistics, you must run the iwstat command-line tool. After restarting TeamSite, the iwstat command will display an additional table. The new table column headings are as follows: The Minutes column indicates the length of time that the monitor went back in time to calculate the values. The Thruput column displays how many operations the TeamSite server handled within the specified amount of time in the Minutes column. The Avg Op column provides the average amount of time each operation took within the specified time. The Load column provides the average load of the TeamSite server. The load is measured by CPU usage and is calculated by multiplying 100 percent times the number of CPUs in the system. If you have three installed CPUs, then the usage would max out at 300 percent. The default value is off. When the Thruputmonitoring value is set to on, then the following monitors are activated: • thruputmonitor1 is set to 1 minute. • thruputmonitor2 is set to 15 minutes. • thruputmonitor3 is set to 1 hour. • thruputmonitor4 is set to 8 hours. • thruputmonitor5 is set to 24 hours. • thruputmonitor6 is set to 48 hours. • thruputmonitor7 is set to 96 hours. • thruputmonitor8 is set to forever.
123
6110_CH06_FINAL
124
7/2/06
1:08 PM
Page 124
CHAPTER 6 ■ INSTALLING THE CMS
disklow_mbytes: The disklow_mbytes value instructs TeamSite to freeze the content store if the amount of free space falls to less than this value specified in megabytes. The default value is 50. Disklowpercent: The disklowpercent value instructs TeamSite to freeze the content store if the percentage of free space on the content store partition falls to less than this value specified in megabytes. The default value is 10. main_lock_model: The main_lock_model value determines the locking model for the main branch within a new content store. This branch is automatically created with no user intervention. This is the only way to set the default locking for the main branch. The default value is submit_lock, although three types of locking are available in TeamSite. The three supported locking models are Submit locking, Optional Write locking, and Mandatory Write locking. See the “Supported Locking Models” sidebar for details of each.
SUPPORTED LOCKING MODELS Three supported locking models are supported: • Submit locking: This is a type of locking in which users can choose to lock a file to ensure their changes are submitted to the staging area. While a file is locked, other users can edit their own version of the locked file within their workarea, but they cannot submit it to the staging area. Once the lock holder has released the file lock, other users can merge their modifications with the new file version. • Optional Write locking: This is a type of locking in which users can choose to lock a file to ensure no other users edit the file even within their own workareas. When a user locks a file, it becomes readonly to all other users. Under the Optional Write locking model, locking files ensures serial development of those files and reduces the risk of conflicting edits. • Mandatory Write locking: This is a type of locking where users are required to lock a file in order to edit it. Until a user locks a file, all files in the workarea are read-only. Under this model, the write lock allows only a single person to modify the file at a given time, ensuring serial development and eliminating conflicting edits.
main_owner: The main_owner value determines which user will own the main branch upon creating a new content store. The default value for a Unix installation is root and is Administrator for a Windows installation. main_group: The main_group value determines which group will own the main branch upon creating a new content store. The default value is root for a Unix installation and is Administrator for a Windows installation. event_log_size: The event_log_site value determines how many Submit or Get Latest operations to keep in the logs. The default value is 64. branch_security: The branch_security value determines whether branches that a user does not have access to are shown to that user in the TeamSite UI. A value of off signifies that the user can see them. The default value is off.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 125
CHAPTER 6 ■ INSTALLING THE CMS
workarea_security: The workarea_security value determines whether workareas that a user does not have access to are shown to that user in the TeamSite UI. A value of off signifies that the user can see them. The default value is off. debug_event_handler: The debug_event_handler value turns on debugging for the submit.cfg file pattern matching to the iwtrace log. The default value is no. old_mod_times: The old_mod_times value determines whether a Get Latest operation updates the time stamp of a file in the workarea. When this value is set to true, then the file’s time stamp is not updated. The default value is true. author_editor_rename_workarea: The author_editor_rename_workarea value signifies whether an author or editor role can rename a workarea. The default value is no. force_EA_mod_times: The force_EA_mod_times value determines whether the modified date of the content files is changed if an extended attribute is changed. The default value is false. webserver_group: The webserver_group value is the group that the web server will be run as.
authentication This section specifies alternative user authentication other than the default of the operating system. The following are the four available types: • You can use an external LDAP server for authentication and authorization of the users’ roles. • You can use an external file formatted like the /etc/passwd file for authentication. • You can pass users to the local operating system for authentication. • You can use a pluggable authentication module (PAM) for authentication.
iwsend_mail This section handles email-specific configurations such as the email server to use and the email address of TeamSite users. It contains the following configuration settings: maildomain: The maildomain value should be set to the appropriate main server domain for TeamSite-generated emails. mailserver: The mailserver value should be set to the mail server that TeamSite will be using to send emails. use_mapping_file: The use_mapping_file value determines whether emails generated from TeamSite should use the email mapping file specified with email_mapping_file. The default value is set to false. email_mapping_file: The email_mapping_file value is where TeamSite will find the default email mapping configuration file. The default for a Unix installation is iw-home/local/ config/wft/email_map.cfg, and for a Microsoft Windows installation it will be placed in the \local\config\wft\email_map.cfg directory off your default installation directory.
125
6110_CH06_FINAL
126
7/2/06
1:08 PM
Page 126
CHAPTER 6 ■ INSTALLING THE CMS
workflow The workflow section contains the following configuration setting: delete_jobs_on_completion: The delete_jobs_on_completion value determines whether workflows are deleted when they are finished. The default value is false.
iw_workflow_ui This section defines the workflow UI custom configurations. It contains the following configuration setting: use_available_templates_script: The use_available_templates_script value determines whether the workflow UI uses a script to list the available workflows or just reads the available_templates.cfg file. If you need some custom capability when displaying the list of workflows available to a user, then you will need to set this to true. The default value is false.
visualannotate This section allows visual annotation configurations. It contains the following configuration settings: va_enabled: The va_enabled value determines whether visual annotation is enabled. The default value is true. harvest_images: When using Visual Annotate, reviewers have the option of saving snapshots of their annotations. This creates a snapshot at a single point in time. These snapshots can contain references to images that are in TeamSite. By default these snapshots might point to images that can be modified at any time in the system. By default, the person viewing the snapshot will see the latest versions of all images referenced by that snapshot and not necessarily the version of the image that was “live” at the time the snapshot was created. Setting this value to true forces those referenced images to be saved at the time of snapshot creation. The default value is false.
■Tip You must restart the TeamSite ContentCenter after changing the harvest_images setting. Also keep in mind that if you set this value to true, performance of the TeamSite server and the amount of disk space of your content store will be impacted.
va_support_email: The va_support_email value sends any support questions to the Visual Annotate support person. The default is the TeamSite administration email.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 127
CHAPTER 6 ■ INSTALLING THE CMS
Considering Additional Component Installation Issues In the following sections, we will briefly discuss add-on component installations. Some of these components are straightforward in their installation, but some, such as TeamPortal, are quite involved and have many different variables based on the hardware and operation systems you have selected.
Installing ReportCenter When you install the TeamSite server, you are prompted to install and configure TeamSite ReportCenter. This add-on module is optional but adds significant value to the TeamSite installation. This reporting module requires an enterprise license of Crystal Reports. ReportCenter uses TeamSite’s event subsystem to track and record TeamSite events. These events are then fed into a database for retrieval and formatting by Crystal Reports, which has a separate installation program and requires a license key. You can install ReportCenter during the TeamSite installation or add it to an existing TeamSite installation. ReportCenter can pull data from several different databases including Oracle, SQL Server, and DB2. ReportCenter does the following: • Sets up the database schema • Sets up the UI configuration file • Configures the TeamSite reporting server TeamSite ReportCenter allows you to perform ad hoc reporting on the following types of events: • TeamSite submits, deletions, and additions of files • Published editions and branch and workarea creations and deletions • Durations of completed workflows and tasks and the number of files attached to workflows • Extended attributes associated to files contained in the TeamSite system Figure 6-1 shows how the event subsystem works in conjunction with the TeamSite ReportCenter server. TeamSite
Update
Event Subsystem
Event
Event
ReportCenter
Delivered Update Database Data Crystal Reports
RDBMS Reports
Figure 6-1. ReportCenter architecture
127
6110_CH06_FINAL
128
7/2/06
1:08 PM
Page 128
CHAPTER 6 ■ INSTALLING THE CMS
Installing Content Transformation Services Content Transformation Services is an optional add-on module to the TeamSite server. Content Transformation Services, once installed, is integrated into the Interwoven publishing process and enables you to transform documents into reusable XML. Content Transformation Services will support many document formats including HTML; Microsoft Word, Microsoft Excel, and Microsoft PowerPoint documents; and PDFs. The Content Transformation Services installation includes two components: the transformation engine and a new UI toolkit that integrates ContentCenter with the transformation engine. The transformation engine is XDoc from Cambridge Technologies, and the installation for this component is handled separately from the TeamSite installation. The UI toolkit comes bundled within the TeamSite server; however, Content Transformation Services is not turned on until the toolkit is configured and XDoc is installed. After installing XDoc, you need to perform several post-installation steps to make it function as you would expect: 1. Open the transform.cfg file located in iw-home/local/config for modification. 2. Specify all acceptable source extensions by adding one line for each, such as . 3. Configure your output formats, such as . 4. Specify your debug options and the server name where XDoc is installed. 5. Specify the local server directory path where temporary files created by the transformation process will be stored.
■Note If you have installed and configured Content Transformation Services properly, you should notice a new menu item labeled Convert underneath the Actions menu in ContentCenter Professional and under the Actions drop-down list in ContentCenter Standard.
Installing TeamXML Today companies need intelligent ways to break up content into discrete components and deliver these mini-docs in an on-demand fashion. Interwoven solves this need with its TeamXML product. TeamXML is an optional, separately licensed TeamSite product. TeamXML allows you to expose the inner workings of the CMS to an external XML editor such as SoftQuad XMetaL or Arbortext Epic Editor. TeamXML is composed of two main components: TeamXML (server-side component): The add-on component for TeamSite, TeamXML allows exposure of the CMS to external editors. TeamXML integrated editors: Interwoven has partnered with several best-of-breed XML editing vendors to provide an end-to-end XML publishing environment. TeamXML allows you to create componentized XML documents, which are then stored in the TeamSite content store.
6110_CH06_FINAL
7/2/06
1:08 PM
Page 129
CHAPTER 6 ■ INSTALLING THE CMS
Installing TeamPortal TeamPortal’s installation is application server dependent. TeamPortal is an optional, separately licensed TeamSite product. The TeamPortal-supported application servers are Plumtree, BEA WebLogic, IBM WebSphere, and SAP. We will provide an overview of a TeamPortal installation, but you will have to read through the installation guide, provided by Interwoven, carefully for the application server you are going to be using. Before you get started, you should follow these steps regarding the installation of TeamPortal: 1. Make sure TeamSite is installed and functioning properly on the server you will be using with TeamPortal. 2. Make sure the OpenDeploy base server is installed on the TeamSite server and that it is functional. 3. Make sure an OpenDeploy reciever is installed on the TeamPortal application server location and is functional. During the installation process, you will be required to install packages on at least two servers. The first server will be the TeamSite server, and the second server will be the application server running your portal application. Multiple packages can be installed based on your hardware and operating system specifications. There is also a Jetspeed reference installation if you would like your developers to have an example installation to reference while they are setting up your production installation. Jetspeed is an open source portal and is installed on Tomcat, so there are no costs involved in setting up the reference server. We recommend you install Jetspeed so your developers have a source to find the answers to questions they may have. When you are looking at your hardware considerations for TeamPortal, it is valuable to get professional advice from Interwoven on the sizing of your servers. The installation guides provide a formula for figuring out the server size, but we recommend you discuss this with Interwoven before you proceed with purchasing your new equipment. Also, do not forget to consider your test environment. You do not have to set up your test environment exactly as you have your set up your production environment, but it is always nice to take a good look at changes in a production-like environment before you place them in production.
Installing OpenDeploy and DataDeploy OpenDeploy and DataDeploy in the past were two separate installations. These products are optional to the TeamSite installation, but you will not have deployment capabilities without them, so in practice they are required. Interwoven has brought the functionality of each of these products together to allow you to take advantage of features that aren’t offered when they are separate products. They still have two separate licenses, though, if you want to use the full functionality of both pieces of software. They do, however, share some functionality with each other. We will cover more about how these products work in later chapters.
■Note OpenDeploy does not have to run as root, but if it does not run as root, it cannot impersonate other users. The security on the web server will have to be performed based on the group permissions.
129
6110_CH06_FINAL
130
7/2/06
1:08 PM
Page 130
CHAPTER 6 ■ INSTALLING THE CMS
Configuring Database Deployments You need to configure the database(s) you will be using for your data deployments within the database.xml file. This file is located in the /etc directory underneath the OpenDeploy install directory. Once you open the file, you will see sections like the following one (which in this case is an Oracle configuration):
The following list explains what the parts mean: name: Specifies the reference name that will be used inside other data configuration files db: Specifies the database host, port, and data source name that will be used with the specified reference name user: Specifies the username to be used when logging into your database password: Specifies the password that should be used when logging into the database vendor: Specifies the database vendor you are using All the entries within this configuration files are examples and are not required. You may set up as many databases as you need. Each database should have an entry in this configuration file. This allows you to change your configuration for a database without having to track down many different configuration files.
Configuring OpenDeploy When the OpenDeploy server starts, it reads the deploy.xml file to determine how OpenDeploy should be configured. The following items discuss the configuration files that are used by OpenDeploy: Server configuration file: The first thing OpenDeploy loads is the server configuration, which is stored in an external file. The external filename is either odbase.xml or odrcvr.xml, depending upon whether it is the base server or receiver. You are not allowed to specify a different path, so the file must reside within the OpenDeploy /etc directory. Deploy.serverConfig: odbase.xml Nodes file: The server nodes configuration determines the name of the nodes file. The nodes file is where you specify all the hosts that the OpenDeploy server will need to contact. You can change the name of the file, but as with the odbase.xml file, you cannot move this file. Deploy.serverNodesConfig: odnodes.xml
6110_CH06_FINAL
7/2/06
1:08 PM
Page 131
CHAPTER 6 ■ INSTALLING THE CMS
Bootstrap username: The bootstrap username is the username that gets created when OpenDeploy starts initially. This user is set up within the administrative GUI to have all administration rights. You should log in as this user and create your additional users within the GUI interface. Deploy.bootStrapUsername: root Bootstrap option: If you do not want OpenDeploy to create a bootstrap user upon startup, you can uncomment the following line and change the value to yes or no. This will, however, delete the bootstrap users the next time OpenDeploy starts, if they exist. Deploy.allowDefaultBootStrapUser: yes OpenDeploy ports: OpenDeploy uses two sets of ports for server administration and event reporting. In the default installation, these ports are set to two separate ports. You can change these ports to another specific port if they are in conflict with another service. You can also set these ports to 0, and the system will pick a port when it starts. Here are the default ports used by OpenDeploy: #Server Administration Deploy.rmiConnectionPort1: Deploy.rmiConnectionPort2: Deploy.rmiConnectionPort3: Deploy.rmiConnectionPort4: Deploy.rmiConnectionPort5: Deploy.rmiConnectionPort6: Deploy.rmiConnectionPort7:
24071 24071 24071 24071 24071 24071 24071
# Event Reporting Deploy.rmiConnectionPort8: 24078 Deploy.rmiConnectionPort9: 24078 Deploy.rmiConnectionPort10: 24078 Deploy.rmiConnectionPort11: 24078 Deploy.rmiConnectionPort12: 24078 Deploy.rmiConnectionPort13: 24078 Deploy.rmiConnectionPort14: 24078 Deploy.rmiConnectionPort15: 24078 Host binding: If you want OpenDeploy to bind to a specific host other than localhost upon starting, you should specify this host name in the configuration file. Uncomment the following line, and change the host if necessary: Deploy.rmiServerbind:
your.host.name
RMI port: OpenDeploy uses an RMI registry to communicate between nodes, and if you need to specify a new port, then you would change the following line. This may be required if you will have multiple versions of OpenDeploy running on the same host. Deploy.rmiServerPort: 9173
131
6110_CH06_FINAL
132
7/2/06
1:08 PM
Page 132
CHAPTER 6 ■ INSTALLING THE CMS
Proxy settings: The OpenDeploy proxy allows you to trigger deployments via a commandline tool. The proxy can be configured to listen to different hosts other than localhost, as is the default. The next line can be changed from the value of y to n to disable the CMT proxy from listening: Deploy.cltProxyEnable: y CLT proxy: The CLT proxy listens to port 3434 by default and can be changed by altering it in the following line. This proxy setting is used by the command-line tools. Deploy.cltProxyPort: 3434 CLT proxy host: The CLT proxy where it will be listening on is localhost, but this can be changed if you have multiple host names on your server. Deploy.cltProxyHost: localhost CLT proxy additional hosts: The CLT proxy will accept commands only from specific hosts. If you need the proxy to listen to additional hosts, you can add them to the next line: Deploy.cltProxyAllowedHost:
your.host.name, localhost
Access key file: If you want to restrict access to your OpenDeploy servers to OpenDeploy administrative tools on specific hosts, you can activate the access key file by changing n to y in the following line: Deploy.useAccessKeyFileForAdmin: n Access key file location: If you activate the access key file, you need to specify the file that you will store your keys in. The filename can be changed, but the file must reside in the OpenDeploy /etc directory. Deploy.accessKeyFile: passphrase
Configuring Neighbor Hosts The odnodes.xml file defines the hosts with which you will be communicating. This configuration file provides you with a single place to configure your servers. These node definitions will be used in other OpenDeploy configuration files. The default nodes.xml file is as follows: Caption Box The second section you are interested in is the body section (see Listing 8-2). The body section, unlike the header section, consists of an “or” combination. This means the contributor will be prompted to select one component from the available components. In this case, you have two components; the first is an item that contains a text area for entering a paragraph of text, and the second component is an item that contains two fields. The first field is the link text, and the second is the actual URL. The combination of these two fields makes up one link. This component allows the contributor to add up to three links in a section. Listing 8-2. The Body Section
169
6110_Ch08_FINAL
170
7/2/06
1:51 PM
Page 170
CHAPTER 8 ■ USING FORMSPUBLISHER
With the way that the data capture template is set up, contributors can create a caption box with both a paragraph of text and links. This is a bonus to contributors, because they have not asked for these features yet! If your style guide does not support this type of caption box, you can limit the number of body sections that contributors are able to add to just one. Figure 8-3 shows what the form will look like when you select the File ➤ New Form menu option in the interface.
Figure 8-3. This is what the caption box form looks like when you create a new record.
Creating the Global Caption Box Data Content Record Now that you have seen the data capture configuration file, contributors can start entering their caption box content. As shown in Figure 8-2, the global caption box consists of the header and a paragraph of text. In the previous sections, you entered the data for this caption box and saved the record. In Listing 8-3, you can get an idea of what the data record looks like when it is written out to the repository. You will use this data content record later in this chapter after we have shown how to create the presentation template for the caption boxes. Listing 8-3. Global Caption Box Data Content Record
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 171
CHAPTER 8 ■ USING FORMSPUBLISHER
Global Info #cdcdcd This information is global, and will show up on several pages.
Creating the Local Caption Box Data Content Record You can also use the data capture template to capture the data used for a local caption box (see Listing 8-4). The record written out is similar to the global caption box, but Listing 8-4 contains a body_link section instead of a body_text section. By putting some extra thought into designing your data capture template, you can reduce the number of capture templates by one or two. Listing 8-4. Local Caption Box Data Content Record
171
6110_Ch08_FINAL
172
7/2/06
1:51 PM
Page 172
CHAPTER 8 ■ USING FORMSPUBLISHER
Section Specific #cdcdcd New Services /newservices.html Promotional Info /promotionalinfo.html Create an Account /createanaccount.html
Data Type Definitions: templating.cfg The templating.cfg file is where you will set up the new data category and type. Once you have inserted the definition within the templating file, your new data type will become available. Open the configuration file, and insert the configuration shown in Listing 8-5 into this
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 173
CHAPTER 8 ■ USING FORMSPUBLISHER
file. You can define multiple categories in this file, so we will show how to create one matching the directory structure created in the example store. The category name will be C8_Example_Category, and this is defined within a element. We have used two other subelements that belong to this category; one is the element that is used to define a default location if one is not specified inside the data type definition. The other is the actual element that will define the data type. The data type name matches the directory structure created in this example, and inside the data type is where you will define the valid presentation types. We have also included a tag and have included the preview-dir attribute to instruct the templating system to generate the temporary outputs to the directory specified. The dir-regex attribute allows you to specify for which directories you should have this presentation as an option. This will allow you to reduce the number of options the contributor can choose from. Listing 8-5. C8_Example_Category Markup from the Templating.cfg File
Using FormAPI Another way to make your data capture templates reusable is to use FormAPI, which taps into the power of using JavaScript to manipulate the capture form based on several issues: • Populating multiple fields based on data entered by the contributor • Performing custom validation after the user clicks the Save button • Hiding or showing fields based on data entered into the data capture form • Adding components or replicants to the form • Automatically setting the name of a capture record based on data entered in a field on the capture form
173
6110_Ch08_FINAL
174
7/2/06
1:51 PM
Page 174
CHAPTER 8 ■ USING FORMSPUBLISHER
Promoting Reuse of Data Types When performing any task, you have two basic ways to do it: the easy way and the hard way. The easy way is to take a problem and solve it, and this is the way we usually go about our work. Although it sounds like this should be the right way to handle it every time, in this case it is not. We suggest you create your data types by grouping your content into like data. Let’s take two different websites for this example: an intranet site and an Internet site. In many companies, two different groups would govern these two sites. These sites may even be under different vice presidents. This makes bringing the sites together much harder, but you still have ways to increase the reuse of your data types. You will start by defining some basic page layouts for each of these sites. Table 8-1 lists the pages we think are appropriate. Table 8-1. Layouts That Could Be Used on the Internet or Intranet Site
Internet
Intranet
Home page
Home page
Landing page
Landing page
Content page
Content page
Search Results page
Search Results page
Legal disclaimer
Login page
FAQ
Site map
Customizable forms
As shown in Table 8-1, several pages appear on both websites. Although the sites have a lot of the same pages, the sites will have a different look and feel. The content audiences will also be different. Some of the content could be duplicated across these two sites, but the presentations of these two sites will most likely be noticeably different in appearance. However, you are not yet concerned with how you will be handling the presentation of these pages. You are more concerned with how you will store the data that will be used to generate these sites. Note that we are talking only about how the data will be stored, not about the actual data. What you need to do now is list different types of data that will need to be displayed on these pages. You can use the same principle you used earlier when defining the data type for the single page. When you looked at defining the data type earlier, you were taking into account only a single page. You now need to broaden your focus to entire websites. Table 8-2 shows how this might look.
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 175
CHAPTER 8 ■ USING FORMSPUBLISHER
Table 8-2. Website Broken Down by Component Type
Component Type
Home Page
Landing Page
Content Page
Search Legal FAQ Results Disclaimer Page
Forms Map
Site
Header
✔
✔
✔
✔
✔
✔
✔
✔
Footer
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
Left navigation Top navigation
✔
Main content section
✔
✔ ✔
✔
✔
Language drop-down list
✔
✔
✔
✔
Caption boxes
✔
✔
Breadcrumb
Multicontent sections
✔ ✔
✔
✔
✔
✔
✔
As you can see by the table, no component types will be used by only one page layout type. Since three layouts have multiple content sections, they can all use the same data structure; therefore, you have identified the possible reuse of your data types. You can use several techniques to make sure you reduce the number of different data types, and by doing this you will reduce the maintenance efforts over the life of these websites. You need to look at each of these components to see which technique will be best utilized in each case.
Header The header is a special component because we usually do not recommend making it a component that is part of a page’s main data type. We like to make the header its own data type because the header is usually the same on every page. Sometimes the header might be more complex such as when maintaining navigation within your header. You can still handle this in this same way, though; you would just create a global navigation data type that would allow the inclusion of navigation by some sort of server-side technology. The header is most often handled in the same manner by being included into the page by the server at render time.
Footer The footer is like the header, but the footer usually is much simpler. The footer usually houses some sort of copyright or legal disclaimer for the site, so the best way to handle the footer is also by making it its own data type.
175
6110_Ch08_FINAL
176
7/2/06
1:51 PM
Page 176
CHAPTER 8 ■ USING FORMSPUBLISHER
■Caution We have seen different websites trying to set the copyright as a static value. Some developers try to fix this by using JavaScript to set this date, but the user’s computer date may not be correct. The best way to handle this is via some server-side technology. This way, you have better control over how the date displays to the website users.
Navigation We touched on the navigation in the “Header” section; if you refer to Table 8-2, you will notice two different types of navigation on these sites. You could treat these two navigations as separate schemes, and you should consider whether there is a potential to have both top navigation and left navigation on the same page. Before deciding on how to maintain your navigation, ask yourself some basic questions about your navigation: • Can I maintain the navigation as a global navigation data type? • Do I have navigation that will always apply only to a single page? • Could one group within my organization govern all your navigation? The idea is to make your navigation manageable. We do not recommend that you just throw your navigation into one big data type if it does not make sense. We do recommend that you do not try to maintain your navigation on a page-by-page basis. Create a hierarchy for your navigation that makes sense to the people who maintain your site. Remember, though, that just because you are breaking down the navigation, you do not have to create separate data types. Use the same data type across your entire navigation.
Main Content Section The content section will be part of each page. If you have pages with multiple content sections, you could create a data type to handle multiple section types but could limit the number of content sections on those pages that allow only one content area.
Breadcrumb You can build the breadcrumb section in a couple of ways. You can actually maintain this information within your page, or you can build a script that would use your navigation data to generate the appropriate breadcrumb based on the page URL. A script is often used; this allows the burden of maintain page navigation to be placed on the script when pages move around the site instead of on the content contributors.
Language Drop-Down List The language drop-down list is commonly found on international companies’ websites. A drop-down list that allows the reader to select a language needs to be readily available since the reader may not be able to read the initial site language. This can be trickier than it seems, because if the page that is being displayed does not exist in the new language selected, it can be difficult to determine which page should be displayed after the new language selection.
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 177
CHAPTER 8 ■ USING FORMSPUBLISHER
A lot of companies do not have a one-to-one mapping of pages between languages. What happens when a user selects a language that is not available? The answer to these questions will determine whether you will be creating a simple component for selecting a language or creating a more robust solution such as an application or a LiveSite component. Chapter 20 discusses LiveSite in more detail.
Caption Boxes Caption boxes add a great deal of flexibility to your site. They allow for content to be added to a page or several pages with little effort. The caption boxes, if implemented correctly, can allow you to put information that would not normally go on a page, such as advertisements or information about other areas of your site that are new or infrequently visited. You can handle caption boxes with a server-side link or include. This allows your content boxes to be updated, and all the changes will be reflected across the site. When building your caption boxes, you need to create a data type that can hold the information for all your different types of caption boxes. You would then build your caption boxes and include the link in your page. Now, whether your page type contains caption boxes or not, you would put this in your data type, but you would not allow the user to add caption boxes.
Building Presentation Templates By creating multiple presentations, you will be able to use the same data type for multiple page layouts. The presentation determines the end appearance of the data. Once you have created your data type, you will usually need a new presentation template to go along with it. In fact, you may need many presentations. Each site you are going to be putting into your CMS will need to have its own set of presentation templates because the sites will all have a different look and feel. If you have created your data types carefully, you should be able to use them for many sites. You can build presentations using a mix of XML, Perl code, HTML, JavaScript, and the Interwoven markup language. This mixture can help you produce powerful presentation templates. Appendix B provides a comprehensive listing of elements and coding APIs for building presentation templates; you can use these elements to build a presentation template for the caption box example.
Creating a Caption Box Presentation This presentation template will be able to render body text, links, or body text and links together in the same caption box. The first item defined in the presentation template is the opening tag that identifies it as a presentation template (see Listing 8-6). After the opening tag, you define the style sheet that most likely will be part of the parent page that the caption box will be included within. This is just a standard style sheet definition. Listing 8-6. Caption Box Presentation Style Definition Section .caption_box_table {
177
6110_Ch08_FINAL
178
7/2/06
1:51 PM
Page 178
CHAPTER 8 ■ USING FORMSPUBLISHER
width:132px; height:100px; border-style:solid; border-width:1; border-color:black; } .caption_box_header { background-color:#cecece; border-bottom-style:solif; border-botom-width:1px; text-align:center; } The next section, shown in Listing 8-7, is the beginning of the table and the definition of the header section. Remember, you decided to enable the background style color to be overridden on the data capture template. You are iterating over the caption_box.header container and checking to see whether the body color has been overridden (by checking to see whether it was left blank). If it was filled in, you render both the text and the background color. The iw_iterate tag also allows the header to have more than one line of text if you have defined the second line of text in the data capture template. Listing 8-7. Caption Box Presentation Header Section
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 179
CHAPTER 8 ■ USING FORMSPUBLISHER
The body section is using an iw_ifcase tag, which would allow for more than two types of body types. You are creating two types in Listing 8-8. The first is a plain-text display. If the length of the body_text field is greater than 0, then you will display the text body. The other section is the default case because there is no condition. The default must be last, and in this instance you will be iterating over the body links and displaying them in a bulleted list. Also note that since you are iterating over the element, you can add more than one body type, therefore allowing you to have body text, a list of links, or both. Once you add the closing iw_pt tag, your template is finished. Listing 8-8. Caption Box Presentation Body Section
■Tip If you do not care whether your presentation template is well formed, you do not have to include the closing iw_pt tag. This keeps you from having to add a bunch of CDATA tags if you are including a lot of code in your presentation template.
179
6110_Ch08_FINAL
180
7/2/06
1:51 PM
Page 180
CHAPTER 8 ■ USING FORMSPUBLISHER
Seeing the Final Output Now that you have the data capture template configured along with the presentation templates, you can virtualize the three pages. The first page generation will be for the Global Info combo box. Remember, you are generating only the caption box, not the entire page. This allows you to insert the caption box into any page. Figure 8-4 shows how the Global Info caption box looks.
Figure 8-4. Global Info caption box The second page is the Section Specific caption box. As you can see Figure 8-5, the caption box contains only the links.
Figure 8-5. Section Specific caption box The third page is the combination of the two pages, as shown in Figure 8-6. We have inserted a section of text and then a couple of links. This is an excellent example of using simple techniques; you can eliminate the need to create a new template for every type of page or component page.
Figure 8-6. Combination body text and links
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 181
CHAPTER 8 ■ USING FORMSPUBLISHER
VIRTUALIZING CONTENT IN TEAMSITE The caption box example is easy to virtualize inside TeamSite, but you can use virtualization on entire sites inside TeamSite. Websites usually consist of a complex set of web content and applications. TeamSite allows for a complex set of proxy rules to be created to map your request to all the proper testing servers. The virtualization system of TeamSite can mimic your production environment to help ensure that the content being approved meets the required standards.
Summary When capturing data, it is important to define your categories correctly, but no hard rules exist for performing this function. The decisions you make will be based on the data and its environment. This means you will not categorize your data in the same way as other companies will define their data. FormsPublisher, once you have determined your categories, will help speed the creation of your data by allowing the data definition to also define the rules for capturing your data. This will save you from having to build applications to capture your data. Once you have captured your data, FormsPublisher will then allow you to build a presentation or skin that you can apply to the data records you have captured. This is good design method for capturing data and page generation.
181
6110_Ch08_FINAL
7/2/06
1:51 PM
Page 182
6110_Ch09_FINAL
7/2/06
1:59 PM
CHAPTER
Page 183
9
■■■
Working with Content in TeamSite I
n this chapter, we will introduce you to the key concepts for repurposing and reusing your content. We’ll also show you how to reuse that content in your organization. First, you’ll learn how to repurpose, reuse, and use metadata effectively. Next, you’ll discover how to manipulate content within the ContentCenter Standard (CCS) and ContentCenter Professional (CCPro) interfaces. These are the two main user interfaces for ContentCenter.
Repurposing Your Content Using Interwoven TeamSite, and thereby separating your data and presentation, makes repurposing content easier than it has been in the past. Repurposing content is different from reusing content because you are actually taking the content that you used for one function and using it for something completely different. For example, if you normally create HTML from your articles, you might decide to publish a newspaper with some pieces of the information that you obtained for the HTML pages; this is repurposing. Companies are really limited only by the expertise of their development teams. If you can develop the appropriate presentation templates, then you can generate your data in any number of output formats. These formats can include, but are not limited to, .html files, .wml files, .js files, .php files, .asp files, .aspx files, .pdf files, and so on. Interwoven’s new add-on module for TeamSite—Content Transformation Services—enables you to take documents from CCPro, CCS, WorkSite MP, Microsoft Word, or imported HTML documents and transform them into a variety of output formats. You can create straight Extensible Markup Language (XML) or any number of XMLbased output formats. You can then easily convert this information to PDF or proprietary output formats based on the transforms you use.
Reusing Your Content Identifying potential reuse is not as difficult as it sounds. Potential reuse is driven directly from your requirements. Do you have sales customers who use PDAs or cell phones? Then WML might be a valid option for you. You can choose from different types of reuse models, and implementing each may require a different solution. Manual reuse is where content authors are responsible for generating the same data in the required formats. Automated reuse is where the system automatically generates
183
6110_Ch09_FINAL
184
7/2/06
1:59 PM
Page 184
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
and possibly deploys the data in a variety of output formats for you. This can also take the form of content that is reused automatically by the system in certain circumstances. For example, search results could include teaser text that points to the source document. Clicking the search result headline or link could take the user to the original source document. This is one example of automated reuse. Another example of automated reuse is presenting news articles dynamically on a summary page of news article links. When the content consumer clicks the links, the news article page appears. The summary page would be automatically re-created each time news articles are added to the repository. The last type of reuse is a hybrid reuse model, where some parts are automated by the CMS, but some parts require manual intervention. For example, say every day at 1 p.m. news feeds from a subscription service are sent to a publicly accessible FTP location. The CMS checks for new feeds every day at 2 p.m. When the CMS finds new articles, it directly imports (or ingests) them into the CMS’s repository. During the ingestion processes, transformation scripts control the formatting and markup of the news feed documents. This type of transforming ensures that a document’s format and layout match the site on which the document will appear. You can also add styles and metadata to further enhance the content. Once ingestion has completed, the CMS adds the new transformed documents to a workflow that generates a notification to the editorial group of the site. So far, this process has been a fully automated one; now for the manual part of the process. A member of the editorial department would then take the task (group task) to begin a review process. This editorial person would review each new article attached to the workflow and ensure that the ingestion process didn’t encounter any problems. If the editor finds a mistake, they manually correct it. Once this review completes, including any necessary corrections, the process is complete. The editor can then transition the workflow to the deployment step where the CMS can publish this information directly to the site.
■Note Teaser text is a snippet of content that is used to entice a user to view more of the article, book, or document from which the text is extracted. Teaser text is usually limited to a paragraph or a few sentences from the original source document. Teaser text is extremely useful in applications in up-sells or marketing tactics. Teaser text can appear along with teaser text from several other documents that are in the same category or that have the same subject. When the content consumer clicks this information, the system can display the up-sell content for that document. The up-sell content will contain information such as the title, subject, extracts or abstracts, and price for the entire document, thereby enticing the content consumer to purchase the full document.
Evaluating the ROI of Reuse Most content management professionals will tell you that calculating the return on investment (ROI) for CMSs is difficult and problematic at best. However, although you cannot calculate the ROI for a CMS exactly, we hope these guidelines will help you get as close as possible. To illustrate this point, we’ll walk you through a hypothetical example (which isn’t actually too far removed from the real world). With a world-class best-of-breed CMS implemented such as Interwoven TeamSite, companies do not have to rely on technical development staff to update content. Instead, companies can use less technical staff (and therefore less expensive staff) to
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 185
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
maintain their content. Companies can even use purely administrative staff to update content. Technical development staff has an associated monetary cost per hour for all the work they do. Additionally, the same is true for nontechnical staff in that they have an associated cost per hour for all work they do. For this example, assume that the mean cost per hour for work performed by the technical staff is much higher per hour than the mean cost for work performed by nontechnical staff. These numbers will vary depending on the salary guidelines in your organization as well as the number of years of experience and the technical skills that each person has. With this in mind, consider that one of the huge benefits of a CMS is its usage of templatized content. Templatized content allows for easy content creation and should not require detailed technical knowledge. What this means is that without a CMS in place, companies must use technical staff to update content, but with a CMS in place (and templatized content), companies can use nontechnical staff. In other words, with a CMS, the nontechnical staff is on the same playing field as the technical staff. Take a look at the following process and some estimated timelines for creating a new page of content without a CMS: 1. The user initiates a new page of content. In this scenario, this is a content owner requesting that IT create a new content page. You can do this via email or a phone call. During this timeframe, the content owner delivers the actual content to the technical resource. Usually the content is not ready for publishing and may have to be converted from its source state. This may take 15 minutes to complete for the technical resource. 2. The technical staff member works with the visual design team (other technical staff) to design the content page, including all fonts, layouts, and page attributes. This process takes an average of one hour. 3. At this point, no code is written to actually develop the page. The format of the page is prepared in this stage, and all previous design elements are added to a shell page. Typically this is performed with a web development tool or other type of development tool. For example, you could use Microsoft FrontPage or Adobe Dreamweaver UltraDev. This process usually takes one hour to complete. 4. In this step, the technical resource creates any needed HTML code and other design elements as needed for the content page using the previously created shell. They also convert the delivered content to a publish-ready format. This process takes an average of one hour to complete. 5. The technical resource writes any needed scripts for the content page, including using JavaScript code or another scripting language. The time to complete this process is one hour. 6. The technical resource sends the completed pages to the content owner and possibly an internal quality-assurance group to review. The technical resource makes any final content updates. This can be performed within 45 minutes. The page is now ready to go through any needed approvals and be published to the site. This entire process from start to finish takes an estimated five hours to complete. If you look only at the technical resources involved in this process and then factor in the resource cost per hour, you can safely assume that for each new content page created, the cost is substantial.
185
6110_Ch09_FINAL
186
7/2/06
1:59 PM
Page 186
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Now look at this process using a CMS and a nontechnical user: 1. The nontechnical user initiates a task using CMS forms by selecting File ➤ New File in the CMS interface. This process usually takes less than 30 seconds, but for ease of calculation assume it takes one minute. One minute, as you know, is 1/60 of an hour, or roughly .02 of an hour. 2. The nontechnical user selects the appropriate form type for the content they want to create. We will assume this process takes 20 minutes to complete; 20 minutes is 1/3 of an hour, or .3 of one hour. 3. The nontechnical user populates the selected form with the content they have prepared or that was supplied to them. This process usually takes about 15 minutes to complete; 15 minutes is 1/4 of an hour, or .25 of one hour. 4. The nontechnical user selects the appropriate presentation template. The presentation template controls all aspects of the look and feel for the page; it also contains any required scripts for the page. This process usually takes one minute to complete. As stated earlier, one minute is roughly .02 of one hour. 5. The CMS generates the completed and publish-ready page. The last step is to send the page through the appropriate approvals, which will be covered by the CMS workflow. This entire process from start to finish takes an estimated 37 minutes to complete. If you look at the nontechnical resources involved in this process and then factor in the resource cost per hour, you can safely assume that for each new content page created in a CMS, the cost is your cost per hour times 37/60 of one hour your total cost. This equates to a cost savings or cost reduction of 85 percent per created page. Table 9-1 further illustrates this concept; fill in the blank cells based on your company’s salaries. Table 9-1. Cost Savings Breakdown
Required Staff
New Pages Created
Mean Time to Create
Technical without CMS
5
25 hours
Nontechnical with CMS
5
3 hours and 5 minutes
Technical without CMS
15
75 hours
Nontechnical with CMS
15
9 hours and 15 minutes
Cost Per Hour
Total Cost
You can see from the table, it does not take long to realize the significant cost savings from a CMS. Additionally, you can calculate the cost savings for updating an existing content page using this same methodology. These types of ROI calculations are easy because they are clearly measurable. However, you can use many other ways to calculate the ROI for CMSs, but the return may not be as clear. Companies employ technical staff for their considerable expertise, such as for maintaining daily operations and developing sophisticated applications. Nontechnical resources simply cannot do these tasks, because they do not have the required technical skills to perform them. How much time and money are lost by requiring technical staff to create and update content pages when, via a CMS, the same content can be created and maintained by nontechnical staff? The answer depends entirely on your organization, but remember that this is something to think about and factor into your ROI calculations.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 187
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Evaluating the Time to Web The next ROI factor to look at is a reduction in the time it takes for your content to reach the Web. With this example, we showed you how without a CMS, a single page of new content could take up to five hours before it is available to be deployed to the Web. A smaller time to Web means that time-critical information is available when your employees, partners, and customers need it. This potentially means more business for the company. Additionally, a CMS allows you publish more content to your web space. Why is that? you may ask. Well, reduced content management time means more time to manage content. By making content easier to manage, content authors tend to make more updates to their content. By reducing many of the problems inherent with managing content manually, more bandwidth is available to provide other information online. For example, you may not have the time or resources to manage internal content as well as the content for external resources. With a CMS, you can place company newsletters, human resources information, and other employee information online, which will significantly reduce your printing costs. These are just some of the ways that implementing a CMS can provide a significant ROI for your company.
Using Metadata Using metadata in your organization is critical. You can use it in a variety of applications, and it serves a multitude of purposes. Consider the following examples of its usage: Content authoring and content retrieval: With the proper usage of metadata, content is easier for your content authors to find. And easier-to-find content equates to higher tool satisfaction and fewer support calls to your content management team. Content authors gain confidence in their abilities to find and modify content in the CMS tool. With this gain in confidence, they are likely to use the tool more, and this means they will come to rely on the tool and its capabilities. In turn, they are more likely to recommend the CMS tool to other users (word-of-mouth advertising), they are more likely to keep their content up-to-date (better content is available to content consumers), and they will rely heavily on the CMS tool by looking for ways to use it in other applications. Portal applications: By having a strictly defined metadata portal, applications can more readily deliver dynamic content based on any number of predefined metadata values. For example, these values could include persona types (for specific users), audiences (for a wider content consumer market), products, expiration dates, and data or document types. This easily allows the correct content to be delivered to the correct consumer at the correct time. Knowledge management systems: Once again, you can use metadata to categorize specific help files or data according to the product, level of detail, and typical consumer. This means you may have some product information that is available to content consumers of your site, some that should be available only to your internal support staff (for example, the help desk), and some that is available for delivery only to your marketing and sales reps. Intelligent content routing: With metadata, you can route content dynamically and intelligently to the correct destination. You can even use metadata to determine the appropriate output format for the specific piece of content (HTML, PDF, or WML, for example).
187
6110_Ch09_FINAL
188
7/2/06
1:59 PM
Page 188
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Auditing: By dynamically capturing the appropriate metadata, you can ensure that the appropriate information is added to your content. This may include values such as author, publish date, publish time, copyright, and author contact information. By using metadata, you can control when content is published, how long the content is available for consumption, and who published the content. Additionally, external systems can then act on this available metadata. Third-party search engines: You can syndicate metadata to search engines to enhance search index creation. This encompasses the search that exists in your web space, such as the search engine that your company owns and that people use while viewing your site. Additionally, this information is available for search engines and spiders that are external to your search, such as http://www.yahoo.com, http://www.hotbot.com, http://www.alltheweb.com, and http://www.google.com.
Working with TeamSite Interfaces The CCS interface offers a flexible deployment so the TeamSite administration team can simplify the look and feel by adding or removing portlets and customize menus. This interface is usually made available to the casual user, because this interface has everything the user needs on a single screen. The CCPro interface can also be customized, but it is not really for an everyday user. This interface allows new menu items and tabs to be added. This interface is powerful but requires a user who understands TeamSite better than a casual user. For the remainder of this chapter, we’ll discuss some of the main features of these interfaces.
CONTENTCENTER PORTLETS When using CCS, you will see different sections on the screen being called portlets. The term portlet may bring to mind a portal. This is the concept that Interwoven is pulling from, but in this case portlets do not follow any portlet standards. A portlet inside CCS is a proprietary piece of code that can be snapped into place and moved around like a standard portal could be, and this is why they’re called porlets. Technically, though, they are just portal-like pieces of code.
Searching Content When people typically think about search functionality, they think only about the user-facing search available from numerous websites. However, with TeamSite 6.7, Verity’s K2 search engine is bundled with the TeamSite software. This doesn’t mean much to the consumers of your content; however, for your content authors, this functionality is a godsend. Oftentimes the biggest complaint that content authors have is that they cannot find the content they have to update. TeamSite for some time now has included the ability to save links to content using
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 189
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
favorites, which can help authors quickly navigate to frequently updated content. However, most authors do not or have not used this feature. To solve this problem, Interwoven has incorporated a search capability that will eliminate the problem of difficult-to-find content. In our opinion, this problem is typically created by a nonintuitive branching and workarea structure, but this is still a very real problem. CMSs should make updating content easier, and CMS teams should strive to make updates via the CMS as easy as possible for their users. You must configure the search to index on each branch. If the branch has not been configured for indexing, then the search will not be available on that branch. TeamSite search is configured to search on specific fields out of the box. You can extend TeamSite search somewhat if needed by using extended attributes. Interwoven TeamSite search is primarily used for the following: • Finding files for viewing, editing, copying, and tagging • Finding outdated files for deletion • Locating files for reuse • Finding files for reporting purposes • Locating files for recovery purposes The advanced search form shown in Figure 9-1 has many different search criteria available out of the box. The Search Text section offers the following options: The scope list: This drop-down list contains two selections; one is the workarea in which you’re currently located, and one is the staging area that belongs to the current branch. Any word search box: This text box, located directly to the right of the scope box, is where you put words you are looking for; the search will match any content pages that contain any one of the words listed. The All Words box: This text box will instruct the search to look for pages that contain every word listed. The Without box: This text box will instruct the search to look for pages that do not have the text typed into this area. The Phrase box: This text box will allow you to enter a phrase that must be exactly matched to be returned in the results.
Figure 9-1. The different ways to search for text within content
189
6110_Ch09_FINAL
190
7/2/06
1:59 PM
Page 190
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
The Content Type section focuses on the whether the search will be run against all content—both generated content as well as captured data (via forms)—or just against the forms. Figure 9-2 shows this section of the search from.
Figure 9-2. The ability to search through different pieces of metadata The Content Type section consists of the following search options: Content Type drop-down list: This drop-down list allows the content type to be selected. There are two values; the first is All Content. This option instructs the search to be performed on generated pages, images, and nontemplate content. The second value is Forms, which instructs the search to be performed against data content records only. If this option is selected, a secondary field appears and lists each data type. Metadata input section: This section consists of a series of input fields for selecting different metadata types, an operator, and a text box for entering criteria. By changing the metadata type and the logical operator, you can focus your search to a specific type of file. You can also get additional metadata lines by clicking the plus sign below the metadata lines, as shown in Figure 9-2. The next section is the File Attributes section. This section is specific to the normal file attributes you would find on any file. Figure 9-3 shows this section of the search.
Figure 9-3. The ability to search through different file attributes The File Attributes section contains the following options: Filename: This text box specifies to search for a file with a specific filename. Created By: This text box specifies to search for files that have been created by a specific username.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 191
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Size: This text box specifies to search for files what will fall in the range entered. Modified By: This text box specifies to search for files that have been modified by a certain person. Created: This text box specifies to search for files that were created on a certain date or range. Modified: This text box specifies to search for files that have been modified during a certain date or range.
Editing Content Content that is stored in a CMS is not very useful if you cannot do anything with it. Interwoven TeamSite allows you to edit all files that are contained in the CMS. Two types of editable content exist within the CMS: • Editable templatized files • Editable nontemplatized files
■Note Templatized files are those files created via the TeamSite templating subsystem. When we refer to nontemplatized files, we are referring to gif, .doc, .pdf, and .xls files, as well as any other file not generated in TeamSite.
When you edit a nontemplatized file, the TeamSite Local File Manager (LFM) appears and opens the file in the default editor configured in the TeamSite LFM. For example, if you have selected to edit a Microsoft Word document with a .doc extension, then the LFM will automatically open the file in Microsoft Word. If you have selected to edit an HTML file, then the LFM could open it in Dreamweaver or FrontPage, whichever you have configured in the LFM for that application extension. When you edit a templatized file, TeamSite will open the form for that file and populate that form with the information contained in the associated data record for that file. This allows you to easily modify that information, save it, and regenerate the output file, which will reflect any new information that was entered.
UNLOCKING FILES Remember that when files are edited in Interwoven TeamSite, these files will be locked to the person who initiated the edit. This allows the CMS to maintain file integrity. When a templatized file is locked from an edit, both the generated page and the data content record (DCR) are locked. If another person wants to edit this file, then they have to unlock the generated page and its associated DCR. You have to do this before the system will allow you to modify the file. The DCR for an associated generated page is usually always stored under the data category of that file. For example, if a home page was edited, then its DCR will usually be stored under something similar to templatedata/home_page/data/ in the workarea for the generated page. You can easily discover the location of the DCR by viewing the file properties of the associated generated page.
191
6110_Ch09_FINAL
192
7/2/06
1:59 PM
Page 192
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
You will see many types of file icons in CCS and CCPro; these exist as visual indicators of a file’s current state in the workarea. These also significantly improve the usability of the TeamSite CMS. However, you must understand what each one means. Use Table 9-2 as a reference for the many different file icons. Table 9-2. File State Icons
Icon
Description
Short Name
This icon means the file is not locked or restricted to any user who has access to the workarea. Additionally, this means that the file has not been modified since its last submission to the staging area. Any user who has access to the workarea can modify this file.
Unmodified and unlocked
This icon means this file is not locked or restricted to any user who has access to the workarea. Additionally, this means that the file has been modified and now is different from the version in the staging area. Any user who has access to the workarea can modify this file.
Modified and unlocked
This icon means this file is locked to the user currently viewing the files contained in the workarea. This file has not been modified since its last submission to the staging area. Only the user who currently owns the lock on this file can modify this file.
Unmodified and locked to current user
This icon means this file is locked to the user currently viewing the files contained in the workarea. This file has been modified since its last submission to the staging area and consequentially is different from that version. Only the user who currently owns the lock on this file can continue to modify this file.
Modified and locked to current user
This icon means this file is locked to a different user than the one currently viewing the files contained in this workarea. This file has not been modified since its last submission to the staging area. Only the user who currently owns the lock on this file can modify this file.
Unmodified and locked to another user
This icon means this file is locked to a different user than the one currently viewing the files contained in this workarea. This file has been modified since its last submission to the staging area and is different from the version in staging. Only the user who currently owns the lock on this file can continue to modify this file.
Modified and locked to another user
This icon means this file is marked as private. This means this file cannot be submitted until the private status is removed. Additionally, this file will not be visible or available to other users, and only the user who marked the file as private will be able to see this file. This is a good way to keep other users who have access to your workarea from being able to modify your files.
Private
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 193
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Icon
Description
Short Name
This icon means the file has been deleted but the deleted file has not been submitted and approved to staging. In TeamSite you must delete files from the workarea and submit the delete to the staging area. By doing this, all remnants of the file are removed.
Deleted but not submitted
This icon means this file has been deleted, but references to the file still exist in the CMS. This condition occurs when you create a new file in your workarea and then add a favorite to this file or attach this file to a task. You then delete the file from your content area but do not delete the favorite or detach the file from the task.
Editing Files in ContentCenter Standard In CCS, you can perform two types of edits. The first is an edit of a nontemplatized file. This will download the file to your local machine and open the file in the default editor for that file extension. The LFM controls this. The second is an edit to a templatized file. This will open the file’s data in the form configured for that file. The method to edit each type of file in CCS is the same. Simply click the Edit link listed beside each file; you can see this in Figure 9-4.
Figure 9-4. Clicking Edit will allow you to edit a file.
Editing Files in ContentCenter Professional In CCPro, you can perform two types of edits. The same actions are performed when you elect to edit a file as are performed when you edit a file in CCS. However, you have many ways to edit files in CCPro.
193
6110_Ch09_FINAL
194
7/2/06
1:59 PM
Page 194
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
The first way to edit a file in CCPro is to click Edit from the All Files listing within a workarea. The file will then open or start to download, whichever is appropriate for the file type, as shown in Figure 9-5.
Figure 9-5. Clicking Edit next to a file within a workarea when using the CCPro interface will allow you to edit a file. The second way to initiate an edit is by selecting one or more files in the workarea and clicking the Edit link from the quick-launch toolbar, as shown in Figure 9-6. There are no real differences when performing an edit this way except that you can select multiple files.
Figure 9-6. You can edit one or more files by selecting them and clicking the Edit link in the quick-launch toolbar. The third way you can initiate an edit is by first selecting one or more files in the workarea and then selecting the Edit ➤ Edit menu, as shown in Figure 9-7. There are no real advantages of performing an edit in this manner. The edit functionality works in the same way as the other edits described.
Figure 9-7. You can edit one or more files by selecting them and then selecting Edit ➤ Edit.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 195
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Using the Local File Manager The LFM is the component of Interwoven TeamSite that manages communications between files stored locally and those corresponding files contained within the TeamSite CMS. The LFM starts each time you perform an edit of a nontemplatized file or an import operation. Although the LFM satisfies most content authors’ needs directly out of the box, it is easily configured for specific needs: • If you want to edit files directly on the TeamSite server. This option must be configured; otherwise, files are edited in the default local directory originally set up on each local computer during the LFM installation. This location is usually C:\IWTemp. • If you want to edit files on the local system. This is the default option. This enables the files to be downloaded, edited, and uploaded to the CMS. • If you want to configure file type associations that are different from those found in the local Windows operating system. • If you want to establish file type associations for first-time usage on Unix. After electing to edit a nontemplatized file, the My Local Files window will open, as shown in Figure 9-8. Click the Settings link to open the configuration for the LFM. The window that opens is My Settings.
Figure 9-8. The My Local Files window allows you to manage files that have been downloaded to your computer.
195
6110_Ch09_FINAL
196
7/2/06
1:59 PM
Page 196
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
When the My Settings window appears, you’ll see three tabs: • The General tab allows you to set and configure remote file editing, as shown in Figure 9-9. • The File Types tab allows you to configure file associations that are stored separately from the local computer. • The Remote Setup tab allows you to configure direct edit preferences.
Figure 9-9. The General tab for the local file settings
Working with Local Files In geek speak, local files are files that are accessible by a computer or file system without requiring any sophisticated network transport mechanisms. Local files appear “local” to the computer that is accessing them; in other words, local files are accessible in the same ways as other files that are stored on the actual hard drive or storage areas of the computer. Files stored on the hard drive of a computer are also local files. In Interwoven TeamSite, you must bring files into the TeamSite repository in order for the CMS to manage them. You can bring local files into the CMS in several ways, but the two ways you will use most are as follows:
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 197
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
• Via the LFM application. The first time you use the LFM, you will be prompted to download and install it. The TeamSite LFM is a component that must be installed inside your browser. You need to download and install this component only one time for each different browser version you will be using. The LFM is invoked when you perform an import operation inside a workarea, when you perform an import operation inside a task, or when editing or viewing nontemplatized files within the CMS. The LFM performs several important functions including the following: • Keeps track of the changes you make to nontemplatized files and then synchronizes these changes in the TeamSite system. • Allows you to work on files in the content management server or on your local machine. • Retains your preferences for downloading and uploading files from or into the CMS. • Remembers the file associations you have configured so that the editor you want is launched for the appropriate file type you have selected. This can be different from the default file association specified on your local computer. • Via the Briefcase component of the TeamSite Front Office subsystem. We’ll cover more information about this product in the next chapter.
Using the General Tab Remote file editing means that files are downloaded to the local computer so you can edit these files locally and then, when finished, simply save the files, close the editing program, and upload the modified files to the TeamSite server workarea. This is the default configuration, but it is also useful for modifying files locally while disconnected from the TeamSite server. Here are the options: • The Directory for Local File Copies input box allows you to specify the local directory where downloaded files are stored temporarily. • In the What to Do with the Local Files After Uploading Them section, you should select the radio button that best describes the behavior you want: • Always Ask Whether or Not to Remove Them: If you want ContentCenter to prompt you each time you upload files, then you should select this option. Use this option if you want to be prompted to remove each file on a file-by-file basis. • Always Remove: This setting will automatically remove local files that you upload without prompting you. This option affects only those files selected for upload. • Always Keep: This setting will keep all files that are downloaded to your local computer. This option is the best option if you want to modify files locally after you have uploaded them.
197
6110_Ch09_FINAL
198
7/2/06
1:59 PM
Page 198
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Using the File Types Tab This setting is applicable only for Windows-based clients. Use this setting if you want ContentCenter to open files in an editing application that is different from the default file association configured on your local machine. Imagine you are an HTML developer who would like to open HTML files in your web browser; however, when editing HTML files from ContentCenter, you would like for those pages to open in FrontPage. You can set up this association from this tab. By keeping these settings separate from the operating system association, you can achieve the desired results. Figure 9-10 shows the File Types tab for the local file settings.
Figure 9-10. File Types tab for the local file settings Here are the buttons you can click on this tab: • Clicking Add allows you to add file associations. You can configure only one extension per entry. Configuring .htm and .html files requires two entries. • Clicking Change after selecting an already configured file type allows you to modify the settings for that file type. • Clicking Remove after selecting an already configured file type allows the file type to be deleted.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 199
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
To set up the association discussed in this example, follow these steps: 1. Click Add to open the File Association dialog box shown in Figure 9-11.
Figure 9-11. Adding a file association to the local file’s handler 2. Click Copy From. This opens the locally configured file extensions, allowing you to select the appropriate one, as shown in Figure 9-12.
Figure 9-12. Extensions and their commands for opening the appropriate editor
199
6110_Ch09_FINAL
200
7/2/06
1:59 PM
Page 200
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
3. Scroll down through the list of file associations until you find the appropriate one (in this example, .html). Click the extension, and then click OK. 4. The extension information is populated in the File Association dialog box, as shown in Figure 9-13.
Figure 9-13. File Association dialog box populated with the information from the “copy from” procedure 5. Modify the Command line setting with the appropriate location of the editor that you want to configure. 6. Modify the other settings if required. Then click OK. 7. This will add the file extension, and you will be returned to the File Types tab. Click Done to finish adding file associations, as shown in Figure 9-14.
Figure 9-14. Your file type now appears on the File Types tab for the local file settings.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 201
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Using the Remote Setup Tab The Remote Setup tab allows you to set up direct editing, which means you can edit files directly in ContentCenter without first downloading them to the local machine. This works with any Windows or Unix client computer that can access ContentCenter. Figure 9-15 shows the Remote Setup tab.
Figure 9-15. The Remote Setup tab for the local file settings Only two fields are available on this screen. The Map This Local Mount input box allows you to set the locally mapped drive where edits should occur: • If your TeamSite system is running on the Windows platform, your drive mapping should follow this format: \\SERVERNAME\IWServer\default\main. • If your TeamSite system is running on the Unix platform, your drive mapping should follow this format: \\SERVERNAME\IWDEFAULT\MAIN or \\SERVERNAME\IWMAIN. After mapping this drive, you should set the drive mapping in this first field. The second field on this tab is the To This Area on the TeamSite Server setting. Enter the ContentCenter path to the workarea where your files are maintained. After these fields are populated, click Done to save your changes.
Importing Images/Content into ContentCenter Importing documents or other content is easy with Interwoven TeamSite. Importing allows you to bring a local file or file accessible from the local computer via a network-mapped drive, for instance, into a TeamSite workarea.
201
6110_Ch09_FINAL
202
7/2/06
1:59 PM
Page 202
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Importing Files into ContentCenter Standard To import files into CCS, perform the following steps: 1. Select import from the Work in Progress portlet, as shown in Figure 9-16.
Figure 9-16. Click the Import button in the Work in Progress portlet to begin the file import process. 2. This will open the Select Local Files to Import screen. From this screen, you can import local files into a TeamSite workarea, as shown in Figure 9-17. This screen displays the following fields: • The first field identifies the current folder on the client system. • The Name field identifies folders and files that are in the current folder. • The File Path field identifies the files to import and their associated file paths. This field becomes active only when files are selected from the Name Field.
Figure 9-17. This screen allows local files to be imported into TeamSite via the CCS interface.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 203
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
3. Select one or more files to import by navigating to the files and selecting them. You can select multiple files by holding down the Ctrl key and selecting each file. To select a contiguous group of files, you can use the Shift key and click the first and last files in the group. Then click the Add button to add these files to the File Path field, as shown in Figure 9-18.
Figure 9-18. Select the files you want to import. 4. Then click the Import button to import any added files. 5. The Import File window will appear, as shown in Figure 9-19. It is from this window that you select the location in a workarea that your file or files are imported into and what happens next to the files. The following fields appear on this screen: • The Content Folder field allows you to browse to the workarea or directory location to where the file(s) will be imported. The slash displayed in this example specifies the current or top-level workarea. • The Next Action field allows you to specify the next action the system should take with the imported files. The following options are available: Submit submits the file for approval immediately after importing it, Submit Work in Progress submits not only the imported file(s) for approval but also all those files contained in your Work in Progress portlet, and Keep As Work in Progress saves only the imported file(s) as a work in progress but does not submit it for approval.
203
6110_Ch09_FINAL
204
7/2/06
1:59 PM
Page 204
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
• The Attach to Existing Task or New Job field allows you to attach the newly imported file(s) to an existing task or to create a new job to handle the submission process. This field is not important if you are keeping the newly imported file(s) as a work in progress.
Figure 9-19. Determine where you want to import the files and what to do next. 6. After making your selections, click the Next button. If you selected to submit the file(s), either the Tag File window or the Select a Workflow window will appear. The Tag File window allows you to tag the imported file(s) with metadata. The Select a Workflow window allows you to select from the available workflows that will execute the submit process for the imported file(s). If you selected to submit a work in progress, you are taken to the Submit Work in Progress window, which allows you to select other workin-progress files to submit. Finally, if you selected to keep it as a work in progress, then the imported file(s) will be added to the Work in Progress portlet, and you are returned to the main CCS screen. Once back at the CCS home page, the newly imported file(s) will be displayed for your usage, as shown in Figure 9-20.
Figure 9-20. The newly imported files appear in the Work in Progress portlet.
■Note An error message will appear if a file exists in the import destination that has the same name as the imported file. You can cancel the import process to retain the original version of the file in the import location, or you can overwrite the destination file by clicking the check box of the file to import and then clicking the Next button. Another important note is that the Import File window reports the number of files you are importing but does not report any selected folders. Folders that you selected to import will be imported but will not be reflected in the file count.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 205
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Importing Files into ContentCenter Professional The process of importing files into CCPro is similar to the process of importing files into CCS. The main exceptions is that you can import from many different locations in CCPro; in addition, the Import To window within the CCPro interface does not take over the entire window as it does in CCS, as shown in Figure 9-21.
Figure 9-21. Allowing local files to be imported into TeamSite via the CCPro interface
Using the Visual Format Editor Visual Format is an editing tool for rich text that you can enable on your data capture forms. This tool is a licensed version of Ektron’s eWebEdit Pro software that is included with FormsPublisher. Within Interwoven TeamSite, all references to the Ektron editor are referred to as the Visual Format Editor. This software allows you to easily perform WYSIWYG content creation without having to know any HTML. Behind the scenes, this editor creates HTML or XML for you based on the formatting of the content. You can enable the Visual Format Editor in three ways: • The first way is via an inline container at the form level. This means that when the form is opened, the Visual Format toolbar appears inline with the rest of the form fields. Doing this improves the user experience (there is a reduction in the number of clicks by one click), but it can significantly impact the form in regard to load time and the lack of glitch-free performance. For forms that contain five or more Visual Format fields, this is not a good option.
205
6110_Ch09_FINAL
206
7/2/06
1:59 PM
Page 206
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
• The second way to include this functionality is via a callout from the form field. The callout opens the Visual Format Editor in a new window, allowing the content author to modify the data in that new window. When this information is saved, it appears in the original form, and the new window is closed. This option has the disadvantage of requiring the user to click to edit the form field via Visual Format, but it significantly reduces the form load time. Additionally, a tech savvy user can elect to not use the Visual Format Editor and can edit the HTML directly in the form field. • The third option is to use the Visual Format Editor from an interface constructed separately from Interwoven TeamSite. This will work; however, Interwoven does not warrant its usage in this manner. The Visual Format Editor is a client-side application that installs inside the client browser. The first time you open a Visual Format–enabled form (option 1) or click a Visual Format button callout (Option 2), the system prompts you to download and install the Visual Format Editor. Figure 9-22 shows the Visual Format Editor, which has been included in a form using the second method.
Figure 9-22. Visual Format Editor
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 207
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
The Visual Format Editor fields are as follows: Cut: This button allows you to cut data from the Visual Format Editor and to place this data in the local computer’s virtual clipboard. Cutting removes the data from being visual. Copy: This button allows you to copy data from the Visual Format Editor and to place this data in the local computer’s virtual clipboard. Paste: This button allows you to paste data from the local computer’s clipboard. Find: This button allows you to locate specific data in the selected Visual Format Editor field, as shown in Figure 9-23.
Figure 9-23. Searching for text Undo: This button undoes the last action performed in the Visual Format Editor. Redo: This button redoes the last action performed in the Visual Format Editor. Check Spelling: This option will check the entire content contained in the Visual Format Editor for spelling mistakes. When mistakes are found, the Spelling window appears, as shown in Figure 9-24.
Figure 9-24. Fixing spelling errors
207
6110_Ch09_FINAL
208
7/2/06
1:59 PM
Page 208
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Check Spelling As You Type: This button automatically checks for spelling errors as you type. When errors are found, that text is highlighted, and you can correct it as you go. Bookmark: This allows you to place bookmarks in the text, allowing you to name each bookmark. Later you can use the defined bookmarks to navigate directly to these entries, as shown in Figure 9-25.
Figure 9-25. Setting bookmarks New Hyperlink: This button allows you add a new hyperlink into the content of the VFE, as shown in Figure 9-26.
Figure 9-26. Adding a hyperlink Hyperlink: This button allows you to edit a hyperlink of many different types, such as FTP, HTTP, HTTPS, NNTP, Gopher, File, Telnet, Wide Area Information Server (WAIS), JavaScript, and news. This button also allows you to set additional hyperlink attributes, as shown in Figure 9-27.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 209
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Figure 9-27. Setting hyperlink attributes Remove Hyperlink: This button removes the selected hyperlink. Horizontal Rule: This button allows you to create a horizontal rule in the text at the position of the cursor. Picture Properties: This button allows you to insert an image in the text and set any associated image attributes, as shown in Figure 9-28.
Figure 9-28. Setting picture properties
209
6110_Ch09_FINAL
210
7/2/06
1:59 PM
Page 210
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Table: This button launches a menu that allows you to insert a table in the text and format the table, as shown in Figure 9-29.
Figure 9-29. Accessing the table menu Edit in Word: This button allows you to edit the entered content in Microsoft Word. nbsp: This button inserts a nonbreaking space character in the text at the position of the cursor. (c): This button inserts a copyright symbol in the text. (r): This button inserts a registered symbol in the text. TM: This button inserts a trademark symbol in the text. Apply Style: This button allows you to specify a style to apply to the selected text, as shown in Figure 9-30.
Figure 9-30. Accessing the style menu Font size or heading level: This option allows you to set the font size. Font: This option allows you to specify the font used for the content entered in the Visual Format Editor. Font Size: This option allows you to set the font size to an HTML size or a point size.
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 211
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Font Color: This option allows you to format the color for the Visual Format Editor content, as shown in Figure 9-31.
Figure 9-31. Accessing the color picker B: This button applies bold to the highlighted text. I: This button italicizes the highlighted text. U: This button underlines the highlighted text. Numbering: This button inserts a numbered list. Bullets: This button inserts a bulleted list in the text. Decrease Indent: This button shifts the text to the left. Increase Indent: This button shifts the text to the right. Align Left: This button left-aligns the entered text. Center: This button centers the entered text. Align Right: This button right-aligns the entered text.
Collecting Data Within FormsPublisher, you can use data collection forms to capture data that is entered by the end user and that will be generated as content. The data collection form closely resembles a standard web form. The data collection mechanism is tied directly to the data type. You can think of a data type as a content type. You could have a data type for home pages, footers, headers, press releases, site maps, FAQ pages, and any other content pages that you need to generate. The definition of the data type is read, and TeamSite builds a data capture template (DCT) based on the data type definition. The DCT, when generated, will display as an HTML form that allows for the appropriate data to be entered. This process reduces the amount of work involved in creating an entry system for your data every time you create a new data type.
211
6110_Ch09_FINAL
212
7/2/06
1:59 PM
Page 212
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
Collecting Digital Assets Trying to keep assets such as images, Flash, and audio clips up-to-date can be difficult. Images are a content type that is most often reused. Images can be leased to companies and may have a limited usage for the life of the image. If this image is shared across workareas and used in multiple locations within the content, then you must have a way to track its usage in case it changes. Companies reuse digital assets all the time, and keeping these assets and their changes up-todate can be a complicated process. Say a large promotion is contracted out to a marketing firm to develop the advertising images. They deliver the images, and the images are used in multiple content pages. However, the images must be removed from content pages when the promotional period expires. A DAM system can greatly reduce the complexity of that effort. Most implementations of websites try to minimize the duplication of the larger assets because there are obvious benefits in doing this, but in many cases it is hard to find these images across small groups. When you start trying to share images across an enterprise, you will require a system that can help you manage changes to these assets. Once an image has been modified, there should be some sort of notification for each author who has used this image. This notification should also contain content pages that have references to that image.
Managing Assets The best possible tool for managing your assets is Interwoven’s MediaBin DAM software. MediaBin allows teams to manage intellectual property to catalog, transform, and deliver digital assets. MediaBin can manage many types of binary data files, including images, PDF files, video, and audio files. You can find more information about MediaBin in Chapter 3 or from Interwoven’s website at http://www.interwoven.com/products/dam. Managing assets using a DAM system and the best practices that go along with this are beyond the scope of this book. However, we can give you a simple best practice to follow for managing assets once they are in the CMS. The simplest way we have found to do this is to separate your files into two categories: global assets and local assets. Global assets should be things that are shared across workareas or sub-branches. Global assets are usually assets that are not placed into the CMS by a content contributor. In other words, global assets are usually referenced in the presentation templates and not in the form of templates or data content records. Some examples of global assets include spacer GIFs, company logo images, navigation images, pagination images, and search-related images. Global assets can also include things such as legal contract documents and product marketing brochures. Global assets would be maintained and administered from your administrative branch (you can find more information about the administrative branch in Chapter 17) and then propagated where needed into any local branches. The content contributors of the local branches never have to know or worry that the global files are there; they simply modify their content, and the global assets are automatically updated. Most global assets are not going to be changed frequently, but if they are, you can import them to the administrative branch and then deploy them to each required branch from there. You may even want to set up a workflow that automates this task for you. By doing that, any changes to the global assets can first be approved before being deployed and then altering every site or subsite in the CMS. Enough about global assets—let’s talk a little bit about local assets. Local assets are assets that are managed at the site or subsite level. Local assets include any assets that are referenced from the forms that are used within each site to manage their content. For example, in the
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 213
CHAPTER 9 ■ WORKING WITH CONTENT IN TEAMSITE
mortgage loan subsection of FiCorp’s site, you may see images of people standing in front of houses with “sold” signs in the yard or images of moving trucks, and you may be able to download a mortgage application or fill in one online. The owners of the subsite or site where they are used should always control these assets.
Summary In this chapter, we discussed how to repurpose and reuse your content across your CMS implementation. We explained how to calculate the ROI for a CMS. Next we discussed the usefulness of metadata and how external systems can use that data. Then we introduced you to the interfaces used by ContentCenter, and we told you what each interface is used for. We told you how to import files into ContentCenter, including templatized and nontemplatized files. We showed you the Visual Format Editing tool, which is used to add rich text into your content files. Finally, we explained data collection and digital asset management. In the next chapter, we will be discussing some of the tools, such as MetaTagger and Front Office, you can use to increase the flexibility of your CMS.
213
6110_Ch09_FINAL
7/2/06
1:59 PM
Page 214
6110_Ch10_FINAL
7/2/06
1:11 PM
CHAPTER
Page 215
10
■■■
Using Tools and TeamSite T
eamSite, although a powerful tool in itself, must integrate with other tools such as metadata collections and external editors to enhance its already impressive features. In this chapter, we will discuss how you can use TeamSite Front Office to help integrate external editors with TeamSite and how you can integrate MetaTagger with TeamSite. We’ll also explain how to build a taxonomy using MetaTagger.
Working with TeamSite Front Office Interwoven has developed some tools to give content contributors access to the underlying content management functionality of their favorite editors. More specifically, TeamSite Front Office (TFO) enables content authors to perform specific content management functions from within local editors such as Microsoft Word.
■Note At the time of this writing, the currently supported applications for the TFO plug-ins (providing menu-based access to TeamSite functionality) are Microsoft Office 2000 and XP, Adobe Dreamweaver 3 and 4, and Adobe UltraDev.
TFO is comprised of two components: a server-side component and a client-side component. The server-side application must be installed on the TeamSite server system. The client-side application must be installed on each machine that will be using the TFO functionality. The TFO client includes all the application plug-ins, the TeamSite Briefcase, and the TeamSite Front Office Configuration Wizard. (We’ll cover these components of the TFO client throughout this chapter.) When you install the TFO client application, the installation adds plug-ins to the client-side editing applications.
215
6110_Ch10_FINAL
216
7/2/06
1:11 PM
Page 216
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Installing the TeamSite Front Office Server To install the TFO server for Solaris systems, follow these steps: 1. Create a temporary directory to untar the installation package. 2. Copy the TFO installation package to the temporary directory. 3. Unzip the installation package using the following code, filling in the appropriate version numbers: % gunzip –c IWOVtfo-sol.x.x.x.Buildxxxx.tar.gz | tar xf – 4. Run the installation program as root: % ./Installplugin.sh 5. Follow the prompts to install the TFO server. The TFO installer must stop and restart the TeamSite iwwebd service to complete the installation. 6. Configure the available workareas for users who will use the TFO by modifying the iwwa.cfg file, which is located in the iw-home/conf directory. The iwwa.cfg file is an XML file that you configure to map users to workareas or map workareas to groups of users. 7. In the iwwa.cfg file, to map users to workareas individually, you must add a element. In the following example, DOMAIN/user is the username of the user who will use this workarea, UniqueAlias is a unique name for the workarea that will appear in the TeamSite Front Office Configuration Wizard, and //store/branchpath/workareaname specifies the vpath to this workarea: //store/branchpath/workareaname
■Note A virtual path, or vpath, is the TeamSite-specific terminology that refers to a directory within TeamSite. Since these directories do not necessarily map to a directory path on the physical drive but rather are maintained in the content store, they are considered virtual.
8. In the iwwa.cfg file, to map workareas to groups of users, you must add a workgroup element. In the following example, UniqueAlias is a unique name for the workarea that will appear in the TeamSite Front Office Configuration Wizard, //store/branchpath/ workareaname specifies the vpath to this workarea, and DOMAIN/user is the username of the user who will use this workarea. Each user in the group must have an entry specified in the element.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 217
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
//store/branchpath/workareaname To install the TFO server for Windows systems, follow these steps: 1. Browse to the directory location of the TeamSite Front Office installation package. 2. Double-click the package to start the InstallShield Wizard. You must have administration rights. 3. Click Next to begin the installation. 4. Choose the location where you want to install the TFO server. The default location is C:\Program Files\TeamSite Front-Office\Server. 5. Click Next. The TFO installer must stop and restart the TeamSite iwwebd service to complete the installation. 6. Click OK. 7. Configure the available workareas for users who will use the TFO by modifying the iwwa.cfg file, which is located in the iw-home/conf directory. The iwwa.cfg file is an XML file that you configure to map users to workareas or map workareas to groups of users. See the examples in the previous instruction steps for the Solaris installation. After performing the previous installation steps for either Solaris or Windows, you must also make the following configuration changes in the iw.cfg file and ensure that each TFO user is added to both the TeamSite author and editor role files: [FrontOffice] auto_index=no index_template_name=index_template.htx wf_submit=no metadatacapture=yes The iw.cfg file contains these relevant settings: auto_index: This setting specifies whether autoindexing is enabled. Valid options are yes and no. index_template_name: This setting specifies the name of the index template that is a configurable template. wf_submit: This setting specifies whether actions performed by the TFO users can initiate a TeamSite workflow. Valid options are yes and no.
217
6110_Ch10_FINAL
218
7/2/06
1:11 PM
Page 218
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
metadatacapture: This setting specifies whether the default metadata capture functions will be used for files manipulated via TFO. Valid options are yes and no. If the TeamSite server has been set up to use MetaTagger, then MetaTagger will be used for these operations; otherwise, the default metadata capture functions of the TeamSite server will be used. You can find more information regarding the installation of the TFO server in the TFO administration guide for your respective version.
Installing the TeamSite Front Office Client To install the TFO client, you must follow these steps: 1. Open a web browser, and point the browser to http://servername/iw/plugins/ where servername is the name of your TeamSite server. 2. Click Install to download the client. 3. Click Next. 4. Choose the location to install the TFO client to, and then select Custom or Typical. The default location is C:\Program Files\Interwoven\TeamSite Front-Office. 5. If you choose to customize the installation, you have to choose the application plug-ins you want to install. Then click Next to continue the installation. 6. If you have chosen to install the TeamSite Briefcase, then you have to tell the installation program the location for the TeamSite Front Office cache; the default location is C:\TFOCache. This directory should have at least 20MB of space available on it. 7. Click Next. 8. The Microsoft XML Parser Setup Wizard will appear. Click Next. 9. Agree to the license terms, and click Next. 10. Enter your username and organizational information, and click Next. 11. Click Install. 12. Click Finish to exit the Microsoft XML Parser Setup Wizard and continue with the TeamSite Front Office client installation. 13. The TeamSite Front Office Configuration Wizard will open. Specify the TeamSite server and default workarea. 14. To complete the installation, log out of the Windows system, and then log back in.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 219
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
OPERATING SYSTEM VERSIONS SUPPORTED FOR THE TFO CLIENT TeamSite Front Office supports numerous operating systems. These include Windows 98, Windows NT 4 with Service Pack 5 or newer, and Windows 2000. In our experience, TeamSite Front Office is not widely used in the industry; however, this product is a viable option for many companies. Interwoven has not been as active in promoting this software product as it has been in promoting other products in its software suite. Indeed, this product may not be supported in the future because Interwoven’s business partnership with Microsoft continues to grow, and the Microsoft Office products add support for CMS integration. But only time will tell.
Understanding the Components of TeamSite Front Office Client As we stated earlier, TFO consists of two components. These components work in conjunction with the CMS to provide access to the content management functionality of the editors configured on the client computer and of the TeamSite Briefcase component. The application plug-ins add specific menu items to editors configured on the client machine. Files manipulated through the TFO client must be added to the TeamSite Briefcase first. In other words, menu items (listed next) will be disabled if you are editing a standard file that is not in the TeamSite workarea that you configured for the user in the TFO server configuration steps. After files are added and available in the TeamSite Briefcase, those files are available for editing on the TeamSite system in the specified workarea. The menu items that are added to editors include the following: Connected or Disconnected: This menu item displays the current status of your connection to the TeamSite TFO server. Open: This menu item allows the user to open and check out a file from the TeamSite Briefcase. If you were to look in the TeamSite workarea after the file was opened, you would see a lock beside the opened file. Save As: This allows the user to save the file in another location within the TeamSite Briefcase. Save and Check In: This action saves the changes made to the file, closes the file, and then checks in the file. Save As HTML and Check In: This action saves the changes made to the file in HTML format, closes the file, and then checks in the file. History/Revert: This action displays a list of the file’s versions and then allows the user to select a version to revert to. File Properties: This menu item lets the user view properties of the file, including several TeamSite-specific properties. Check Out: This option applies only to users of mapped network drives; for TFO Briefcase users, the checkout function is performed automatically. Cancel Edit: This option discards the edits made to the file and then reverts the file to the version currently stored in the TeamSite staging area.
219
6110_Ch10_FINAL
220
7/2/06
1:11 PM
Page 220
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
The other components of the TFO client are the TeamSite Front Office Configuration Wizard and the TeamSite Briefcase. The TeamSite Front Office Configuration Wizard allows you to connect to the TeamSite system and allows you to configure your connection parameters such as selecting the workarea from the available workareas configured for the user. The TeamSite Briefcase is a locally configured interface to the TeamSite workarea. This component will also give you information about a file’s status in the TeamSite system. After the TeamSite Briefcase is configured, you will see a desktop icon on the client system, as shown in Figure 10-1.
Figure 10-1. The TeamSite Briefcase desktop icon
■Note TeamSite Front Office uses SOAP over HTTP or HTTPS to communicate between the TeamSite server and TFO client. This communication protocol simplifies administration and provides security over the data transmitted.
Implementing Logging To enable TFO server-side logging on the TeamSite system for Solaris, follow these steps: 1. Add the following lines to the iw.cfg file’s FrontOffice section: [FrontOffice] server_log_file=/var/adm/tfoserver.log 2. Make sure that the file /var/adm/tfoserver.log is writable by the iwui user. 3. Restart iwwebd using the iwreset –ui command. To enable TFO client-side logging, follow these steps: 1. Modify the client registry settings under HKEY_CURRENT_USER\Software\Interwoven\ TeamSite Front-Office: String Value:ProxyLogFile, Value Data: full path of the log file. 2. Kill iwserverproxysvc.exe(IWSERV~1.EXE) from Task Manager.
Using External Editors You can also use external editors with TeamSite; Interwoven has several plug-ins that are available for third-party editors, including Dreamweaver UltraDev, Microsoft FrontPage, IDM UltraEdit, Macromedia HomeSite, Macromedia Contribute, Microsoft Word, and EvrSoft First Page. You can use any third-party editor of your choice, but if you do, extended menu functionality will not be available through your editor.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 221
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Working with MetaTagger MetaTagger is Interwoven’s enterprise-class content intelligence software. MetaTagger allows you to intelligently enhance your content via predefined taxonomies or vocabularies. Using MetaTagger, you have the capability of sharing, managing, and reusing content by analyzing the content and identifying key concepts from that content. You can also use MetaTagger by itself; it is not included with TeamSite but is often integrated with TeamSite. We have already discussed the business value of correct metadata, but business applications rely heavily on this data as well, perhaps even more so than end users. Applications such as search engines, collaboration tools, document management tools, and portals rely heavily on metadata or data attributes to function properly. Metadata allows these applications to present, archive, and route this content appropriately. Using metadata captured via MetaTagger allows you to dynamically associate this metadata with your content. Why can’t you do this manually? That’s a good question, but here is the answer: even if you had the time, financing, and expertise available to create metadata manually, you still should be concerned about the risk of human error in entering this data. In other words, entering junk metadata or metadata differently each time is useless and defeats the purpose of having that data. MetaTagger uses a combination of advanced statistical analysis and natural-language processing technologies and algorithms to generate metadata. MetaTagger includes the following features: Automatic categorization of content: MetaTagger automatically categorizes content from one or more taxonomies. MetaTagger supports contextual recognition and categorization by example. Contextual recognition is the categorization done by keywords, phrases, and in-context cues. Categorization by example takes place by taking documents that have to be tagged and comparing them to previously tagged documents. Summary and keyword extraction: MetaTagger can generate content summaries from selected documents; these summaries can range from a few words to a paragraph or two in length. This information can then be syndicated to search engines and can be presented as dynamically created teaser text. Content profiling and taxonomy creation: MetaTagger can take a representative sample of all your content and automatically create a taxonomy from the sample. You can then prune this taxonomy and enhance it manually to more accurately reflect all your content. Entity extraction: MetaTagger has the ability to extract specific entity information including names, dates, title, publish time, and author using its recognition engine and previously compiled knowledge bases. Document structure analysis: MetaTagger can extract specific sections of content according to formatting and structural cues. MetaTagger includes the following three GUI interfaces: MetaTagger Studio, Content Intelligence Configuration Manager, and Content Intelligence Metadata Viewer (CIViewer).
221
6110_Ch10_FINAL
222
7/2/06
1:11 PM
Page 222
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Using MetaTagger Studio MetaTagger Studio is your interface into MetaTagger. This interface is a project-based development environment that allows you to build and test taxonomies and any associated categorization models. Studio is installed on the system where the MetaTagger server is installed, but it can also be (and usually is) installed on client machines. The Studio client interacts with the MetaTagger server. If the connection is lost to the server, MetaTagger Studio automatically shuts down, cleans up any temporary files, and closes the client application. You can find more information about MetaTagger Studio in the Interwoven MetaTagger manuals or by contacting Interwoven Technical Services.
Using the Content Intelligence Configuration Manager This interface is a web-based application that allows users to control how MetaTagger analyzes input files and generates metadata. This interface uses a wizard-like interface to assist users through the basic steps of configuring all configurable components of MetaTagger functionality. The Content Intelligence Configuration Manager is beyond the scope of this book, but you can find more information about it in the MetaTagger manuals or via Interwoven technical advisors.
Using the Content Intelligence Metadata Viewer (CIViewer) This web-based application allows users to tag content and then view the associated metadata that is generated. You can use the preconfigured MetaTagger functionality (shipped with MetaTagger) to test the types of metadata that can be generated and the major configurations that can be used to produce certain metadata.
Introducing Taxonomy We briefly talked about taxonomies in earlier chapters, but now we’ll talk in a little more depth about what a taxonomy is and what having one can do for your organization. Taxonomy is the science of classifying, organizing, and relating terms, concepts, and their relationships so that the end result can be used for discussion, analysis, and information retrieval. A good taxonomy should include all possible permutations of the objects being classified. Taxonomies provide control by enforcing agreed-upon or standardized category names while associating those names with other forms and permutations of the content or concepts. Taxonomies primarily organize the following: Content: Content can be organized by type and structure. Examples of organization by content include the following categories: formatted, unformatted, HTML, press release, news content, binary, ASCII, processed content, and unprocessed content. Concepts: Concepts are thoughts, notions, or other generalized or abstract ideas created from common and particular instances. Concepts can organize information based on company-specific information or terms. Using concepts, you can organize and arrange those concepts in ways that may not be familiar to external content consumers. FiCorp may have a billing application that should be associated with stock trading, for example.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 223
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Stock trading has an associated billing fee or transaction fee associated with it. Although this may not be obvious to you, it makes complete sense to the trading desk at FiCorp. MetaTagger can associate these concepts. Relationships: Relationships are quite obviously how concepts, terms, or elements in taxonomy relate to each other. Relationships are usually obvious once you start to think about them. You can classify relationships in several ways. Some of you may be familiar with relationships as depicted in entity relationship diagrams (ERDs). Some examples of relationships found in ERDs include the following: • is a • has a • is a member of • is a type of Creating a customized taxonomy for your company will enable you to enhance your metadata significantly. You can view taxonomies as frameworks or maps for organizing your metadata. Taxonomies maintain control of your metadata by using standardized or category names and associating those names with other forms of the name. You can use taxonomies to associate technical terms or data with their more familiar forms and associate all of them to the same ID. Taxonomies are also useful for developing metadata that is specific to a certain content consumer. For example, by having terms associated to certain personas, then that content can be pulled and delivered directly to that content consumer by matching values stored in that persona’s profile.
■Note Personas are profiles of particular users who may be using your content. Personas are not plucked out of the sky but rather are built from extensive interviews with typical content consumers of your content. Personas are built as if you are interviewing a specific person. There will be multiple “real” people to each persona. Think of this as a grouping of content consumers by their demographics, interests, and professions. Using personas effectively allows you to design content specific to a person or content consumer. By arranging content in this manner (matched to a persona), you can be certain that the persona will match multiple users of your content. Many resources are available to assist you in designing personas and using them effectively in your organization, but further persona discussion is beyond the scope of this book.
For example, while a nurse may want to look at care options for a patient with melanoma, a research nurse may want to view content that involves the latest developments in the area of melanoma prevention and cures. Of course, they may have common content that is associated to each of their profiles. Taxonomies define the depth and breadth of certain concepts. Each concept in a taxonomy is called a node. Taxonomies can be flat or hierarchical in nature. Defining a taxonomy can be an arduous process but, done correctly, can greatly increase the usability of your content.
223
6110_Ch10_FINAL
224
7/2/06
1:11 PM
Page 224
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Creating a Taxonomy with MetaTagger Studio We recommend when creating a taxonomy to allow the MetaTagger Studio tool to assist you by using the Recognizer project type available when starting MetaTagger. For this example, gather a representative sample of your content, and store that content in a directory. You will use MetaTagger’s Recognizer tool to build the beginning of what will be your taxonomy. You can do this by performing the following steps: 1. Start MetaTagger Studio by selecting Start ➤ Programs ➤ Interwoven ➤ MT Studio. 2. The MetaTagger Studio client will load, and the login screen will appear, as shown in Figure 10-2.
Figure 10-2. MetaTagger Studio Login dialog box 3. The default User Name and Password settings for MetaTagger Studio are admin and admin, respectively. Enter your username and password as well as the host name and port number in the Server field. Click OK to continue. 4. The New Project dialog box will appear, as shown in Figure 10-3.
Figure 10-3. New Project dialog box
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 225
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
5. Select Recognizer for the project type, and select Starting a New Project. 6. Give the project a name (for the example, enter FiCorp). Select the language if applicable. The languages shown in Figure 10-4 are available.
Figure 10-4. Selecting a language for your MetaTagger project 7. Click OK to continue. The MetaTagger Studio window will open and display your new, blank project, as shown in Figure 10-5.
Figure 10-5. Project window after initial creation
225
6110_Ch10_FINAL
226
7/2/06
1:11 PM
Page 226
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
8. To build your taxonomy from your sample document collection, select File ➤ Generate ➤ Vocabulary from Directory, as shown in Figure 10-6.
Figure 10-6. Selecting the File ➤ Generate menu 9. The Open Directory dialog box will appear. Navigate to the directory that houses your sample content, and click Open to continue, as shown in Figure 10-7.
Figure 10-7. Navigating to the directory with your sample content
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 227
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
10. MetaTagger will begin to build your taxonomy index. This process may take several minutes depending on the complexity of your data and the size of your sample content, as shown in Figure 10-8.
Figure 10-8. Generate Vocabulary dialog box 11. MetaTagger will complete the process of building your vocabulary and present the generated data, as shown in Figure 10-9.
Figure 10-9. Your project window after generating the data 12. After you have built the vocabulary, you can begin the manual process of editing each generated node and adding or deleting nodes that are not relevant. This will make your taxonomy more representative of your metadata needs.
227
6110_Ch10_FINAL
228
7/2/06
1:11 PM
Page 228
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Adding Nodes To add a node and then a subnode in MetaTagger Studio, follow these instructions: 1. Select Edit ➤ Add Node to add a top-level node. The node will be added alphabetically based on the node name. 2. Right-click an existing node, and select Add Node, as shown in Figure 10-10.
Figure 10-10. Adding a node to your taxonomy 3. Give the node a name, and edit its remaining attributes. The available attributes for each node are as follows: • Label: This node attribute is the taxonomy term name. • UID: This is the MetaTagger unique identifier for the individual associated taxonomy term. • Definition: This is the description of the node. This attribute is for display purposes only and will not be used in metadata generation. • Alternatives: These are alternative forms of the taxonomy term. For example, in FiCorp’s taxonomy, the term credit card exists. An alternative of this term name is tarjeta de credit, which is the word for credit card in Spanish. Another example is individual retirement account; an alternate for this term could be IRA, which is the three-letter acronym. • Clues: This specifies additional words and phrases that MetaTagger can use to reduce the ambiguity that can occur when more than one term applies to more than one category in the taxonomy.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 229
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
• weak: This attribute sets the term as weak, which means that MetaTagger will not use the term in metadata creation unless other attributes match in the interpreted document. This attribute can have a value of True or False. • Parents: This attribute specifies whether this term has any related parent terms. If the node is a child of an existing node, then use this attribute. The node will be applied in the taxonomy appropriately when this is set. • Children: This attribute specifies whether this term has any related children terms. If the node is a parent of an existing node, then use this attribute. The node will be applied in the taxonomy appropriately when this is set. • Related Terms: This attribute specifies terms that are related to the current term but do not exist in a parent, child, or sibling relationship. For example, within FiCorp, the credit score is related to the available lines of credit, but it is not a parent, child, or sibling of available lines of credit. • Test Files: This attribute identifies files for testing the taxonomy.
Editing Nodes You can edit nodes to correct data or attributes and to add to existing data associated with each taxonomy term. To edit a node, follow these instructions: 1. Right-click a node, and select Edit Node; alternatively, click a node, and select Edit ➤ Edit node from the toolbar. The Edit Node dialog box will appear, as shown in Figure 10-11.
Figure 10-11. Edit Node dialog box 2. Navigate via the tabs, and edit any data that is required. 3. When you are finished, click the OK button.
229
6110_Ch10_FINAL
230
7/2/06
1:11 PM
Page 230
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
Automating the Taxonomy After editing, deleting, and adding nodes, you must perform a few more steps to make your taxonomy ready for usage on your content. The next step is to automate the taxonomy, which means you are building a usable index from which MetaTagger can automatically generate metadata. To automate the taxonomy, follow these steps: 1. Select Build ➤ Build Index, as shown in Figure 10-12.
Figure 10-12. Selecting the Build Index menu 2. The build process will begin, as shown in Figure 10-13.
Figure 10-13. Building Indexes dialog box
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 231
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
3. The build process will complete, as shown in Figure 10-14.
Figure 10-14. Notification that your index is complete 4. Select OK to complete the build.
Testing the Index Once you have built your index, you should confirm that it will generate metadata appropriately by testing it. It may be necessary for you to test, modify, and test again to ensure your taxonomy is useful and correct. To test your index, follow these steps: 1. Select Test ➤ Test Index. The Test Index dialog box will appear, as shown in Figure 10-15.
Figure 10-15. Test Index dialog box
231
6110_Ch10_FINAL
232
7/2/06
1:11 PM
Page 232
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
2. Click the Import button, and navigate to your sample testing document. This document should be representative of your content, as shown in Figure 10-16.
Figure 10-16. The dialog box will now show the text that has been imported. 3. Click Run Test to begin the test process. The bottom half of the window will show the terms from the taxonomy that were present in the document, as shown in Figure 10-17.
Figure 10-17. These are the terms that were presented during the test.
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 233
CHAPTER 10 ■ USING TOOLS AND TEAMSITE
If the expected terms do not appear, then you should edit your index, rebuild the index, and test again. Continue these iterative processes until you are satisfied that your taxonomy is accurate.
Committing the Taxonomy After you have successfully created your taxonomy, you should commit it to the MetaTagger server. To commit the taxonomy, follow these steps: 1. Select Build ➤ Commit Index. This action will copy the project to MetaTagger Studio. 2. When the index is copied successfully, the message Index Committed will appear. Click OK to continue, as shown in Figure 10-18.
Figure 10-18. Notification that the index has been committed After you have created your taxonomy and committed it to the server, the last step to expose this taxonomy for clients to use is to deploy it. Deploying the taxonomy is the same conceptually as publishing the taxonomy to the production environment. After deployment, MetaTagger will use the taxonomy to create metadata for content.
Summary A CMS cannot be comprehensive unless it allows content contribution to be performed with external tools such as editors. TeamSite allows any editor to be used but provides special features when using Front Office. These features allow users to contribute content by utilizing the same tools they use on a daily basis. Also, MetaTagger can be an effective tool for creating an accurate set of metadata for your search engines. This allows contributors to perform more accurate searches that allows for a better system for reusing existing content.
233
6110_Ch10_FINAL
7/2/06
1:11 PM
Page 234
6110_Ch11_FINAL
7/2/06
1:12 PM
CHAPTER
Page 235
11
■■■
Deploying Files and Data W
hen it is time to move your content to its final destination, whether it is static content or dynamic content, you can use OpenDeploy coupled with DataDeploy. Moving your content is only the first part, though; you also need to coordinate the move. If a piece of this move fails, it is nice to know that the rest of your content doesn’t move; in other words, all the data remains synchronized. In this chapter, you will learn how you can use OpenDeploy and DataDeploy to ensure your static content and dynamic content moves are synchronized. This chapter will also discuss many of the options that will help ensure that these products fit into your environment.
Using OpenDeploy Your CMS has many important features, and one of these aspects is how you push modified files to the servers that interface with your users. OpenDeploy gives you the flexibility of many different deployment schemes while also maintaining a security model that limits the access of the remote servers to only those directories to which you need to deliver content.
Understanding the OpenDeploy Components The OpenDeploy product is divided into two server components: the base server and the receiver component. The base server is a fully functional component that can both receive and initiate deployments. The receiver component only has the ability to receive content files. In other words, you cannot initiate a deployment from a receiver. Using the receiver component actually achieves two goals. First, you cannot deploy files from the server, so you maintain the security of your files. Second, using receivers—where files can be received only—reduces the cost of your CMS deployment because the cost of a receiver is significantly less than a fullfeatured base server. In this chapter, we won’t cover OpenDeploy in great detail, but we will give you an overall understanding of how it works and what it can do for you.
Using the Base Server Software The base server is the full-featured OpenDeploy server component. If you need a host to be able to initiate a deployment, you need to install a base server on that host. The host could have a shared file system that allows a server with a base to access it, but you still need a base server to be involved in the initiation of the deployment. The base component can also receive a deployment from another host or receive a deployment from itself. The base server is configured
235
6110_Ch11_FINAL
236
7/2/06
1:12 PM
Page 236
CHAPTER 11 ■ DEPLOYING FILES AND DATA
mainly through the odbase.xml configuration file and also obtains node configuration information from the nodes.xml file. The odbase.xml file, besides setting up ports and logging, also defines the nodes that are allowed to deploy to this base server. The section of the configuration file that defines this is the allowedHosts section. When you install the base server, it creates an odbase.xml file that contains the following configuration for the element: The first allowed node is the TS-HOME server node. This entry of the allowedHosts section identifies that this server can deploy to itself. Within each node are parent-level paths that each of the remote servers has access to for deployment. According to the first path section, the TS-HOME host can deploy files anywhere below the C:\Interwoven\OpenDeployNG\tmp directory. The allowed hosts enable you to protect the areas of your file system that you do not want OpenDeploy to be able to write to. If a deployment is initiated anywhere other than the defined paths, the deployment is rejected. Another security benefit to this is that you have the flexibility of opening as much of the file system to a highly trusted host as you want, but you can also be much more stringent on what a less secure host can access.
Using the Receiver Server Software The receiver component operates much like the base server but allows deployments only to be received. The odreceiver.xml file is the receiver’s main configuration file and also configures the allowed hosts just as with the base server component.
Introducing OpenDeploy Basics In the following sections, we’ll cover two of the files that provide some of OpenDeploy’s base functionality. The first is the nodes file, which helps externalize the OpenDeploy servers to which a particular server can deploy files. The second is the configuration file, which is responsible for defining how a deployment should be performed.
Configuring the Nodes File Think of OpenDeploy as a network consisting of computers that need to send or receive deployments. Each computer that participates in the OpenDeploy network is considered a node. A node is constructed of bits of information about one particular server such as the host name of the server and the port number on which OpenDeploy is listening. Nodes can be
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 237
CHAPTER 11 ■ DEPLOYING FILES AND DATA
defined in each deployment configuration but are usually defined in the nodes.xml file on each server, and this file needs only to define nodes that this server will be in contact with during deployments. The configuration example that follows illustrates how a element should be defined. The name attribute defines the name that is used within a deployment configuration file. The host attribute represents the host name, and the port is the port number where OpenDeploy will be listening. Make sure the host name is configured for the DNS server that this server is using. If it is not, you will need to use the IP address for this machine instead of the host name. As you can see, the syntax of the node file is simple: The advantage of defining a host in the nodes.xml file is that you can change the host names in one file without having to change them in each deployment configuration file. Once you change the nodes, the next deployment that is fired off manually by command line or one of your workflows will start using this new configuration.
Creating the Replication Farm Typically, you will need to deploy content files to more than one server. To facilitate this process, Interwoven has created the element. Once you have configured your nodes, you can create the element inside the nodes.xml file. This may require a little planning, and you can create more at your discretion, but what you need to do first is to create a replication farm for each group of servers that this server will be deploying content files to, even if it’s only one node. The element consists of a name, which is required to be unique just as a node. This name will be used within a deployment configuration file. Inside this element is a list of elements, with the node that needs to be included in the useNode attribute value. The following portion of the nodes.xml file should illustrate how this section will look: Inside your deployment configuration file, you could reference the Farm_Set_1 replication farm, and the deployment would be sent to the target_server_1 node.
Configuring the Deployment The term deployment commonly describes the process of moving files from one place to another, or from a source to a target. The source and target may reside on the same machine
237
6110_Ch11_FINAL
238
7/2/06
1:12 PM
Page 238
CHAPTER 11 ■ DEPLOYING FILES AND DATA
or multiple machines; however, in most situations, you will move content from a source machine to many target machines. OpenDeploy facilitates this moving of content. You can define many rules and conditions to ensure that only the proper pieces of content get moved and get moved only from sources that are allowed to deploy content to a particular target. You place information regarding these source and target locations and deployment conditions within the deployment configuration file. The deployment configuration file also contains the definition or instructions for the process that will take place for a specific deployment. You may have as many of these configurations as you need, depending on the complexity of your network and how many different content targets you will be deploying to. To help you understand how deployments are defined, we will show how to take the default test.xml file from the OpenDeploy configuration directory, which is placed there by your initial installation, and modify it slightly. Along the way, we will explain the overall deployment process defined in test.xml. The example deployment configuration, shown in Listing 11-1, uses an internal farm set named farmset_a. You define an internal farm set directly in the deployment configuration file instead of the node.xml file. You might want to use an internal farm set instead of an external farm set to ensure that the farm set does not get modified. This farm set is defined on line 6. Since in this example we want to deploy to target_server_1 only, we have set this on line 7. Listing 11-1. Example Deployment File 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 239
CHAPTER 11 ■ DEPLOYING FILES AND DATA
28 29 30 31 32 The next section is the deployment definition, titled test_definition. This is where you define the source and target. You want to deploy content files that are within a standard directory, so you can use a file system target defined by the element. The root of the source location is named by the area attribute and is defined as /opt/www on line 13. You can further define the source using the element. This tag allows you to identify specific directory paths relative to the area specified in the element. In this case, on line 15 you are intending to deploy everything beneath the specified root directory by using a period. The target section defines the target servers and the location on each of these servers. As with the source, the target root location is defined by the area attribute, which in this case is set to /var/www. The files within the source directory will be compared to the remote directory based on the last date modified. OpenDeploy retains the modified date and time when it moves the files, so only files that have been modified from the source server since the last update will be transferred. The replication farm link is set to internal and points to the farmset_a defined earlier in the file. When the files are transferred to target_server_1, the permissions of the files will be set to 0664, and the directories will be set to 0755. The element specifies the deployment actions. The action in this case will use the definition test_definition and will perform the deployment with transactional turned off. This means if the deployment fails, the files will not be rolled back to their previous states. This configuration file is relatively basic, but it gives you an idea of its general format. You can use many advanced techniques when creating a deployment configuration; indeed, a deployment is flexible and limited only by your own imagination. The Interwoven reference manuals describe in great detail how to use this tool effectively.
Substituting with Parameters You can make your deployment configuration files quite dynamic by using a parameter instead of hard-coding a value such as a source directory. For instance, if you needed to deploy to different groups of nodes depending on a condition in a workflow, you could create a variable for your farm set name and then be able to determine when starting the deployment which farm set to use. Refer to the following configuration line to see how to define a parameter in the configuration file: When you initialize the deployment, you need to pass only the key/value pair to the deployment such as farmset_name=my_farmset, and the $farmset_name^ parameter will be replaced by the value my_farmset. The basic idea is to create a standard deployment process and then parameterize small parts of it to make the configuration file as useful as possible. You can carry this as far as you would like, but remember that simpler is often better.
239
6110_Ch11_FINAL
240
7/2/06
1:12 PM
Page 240
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Understanding Deployment Types In the following sections, we’ll introduce three basic deployment types: reverse, routed, and multitier. Each deployment type has unique benefits and solves different problems.
Implementing Reverse Deployments In some security models, it is necessary to pull content from a less secure environment or staging area into a more secure environment such as production. You can solve this problem by using reverse deployments, which are actually triggered by the target server. The deployment configuration file should reside on the reverse deployment target and will provide the deployment process that will be followed. Once the reverse source receives the request, it will perform the deployment as specified within the deployment configuration file located on the reverse target. Figure 11-1 shows exactly how this reverse deployment transaction is handled. Production
ent
oym
epl
D ger
Trig
t
ten
y plo
Con
Reverse Target
De
Reverse Source
Staging
Figure 11-1. Reverse deployment A reverse deployment can be confusing, so you should think of the deployment as a normal deployment from the source to the target, but the target is requesting the deployment. Once the request is made, the deployment happens as if it were initiated from the normal source.
Implementing Routed Deployments We have seen very few implementations that have actually built redundancy into their deployment process. If you would like to build in this redundancy, you can do that with a routed deployment, which allows you to have multiple paths for deployments to frontline servers, because a routed deployment specifies only the beginning and end of the deployment. It does not specify how to get there, but it depends on the OpenDeploy nodes configuration to determine its path. Figure 11-2 shows Node A, which needs to deploy to Node D.
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 241
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Node B
Node A
Node D
Node C
Figure 11-2. Routed deployment You have two ways to get to Node D, but either of them may be down at any one time. They usually do not go down together, so you want a way to route content to Node D without hard-cording the specific route. A routed deployment, using the element inside the odbase.xml file, can specify multiple routes. When the routed deployment is initiated, it will read the nodes configuration for each server involved to determine the available routes. The route is then calculated at deployment time; therefore, you do not depend upon one node, and the process is much more likely to succeed.
Implementing Multitiered Deployments You may need to deploy content to secured network segments as part of your deployment scheme. If this is necessary, you may have to deploy your files through a gateway machine. The gateway machine may be the only way to deploy files to the secured segment, so you will have to configure a multitiered deployment. Consider a scenario in which you would like to deploy content to a testing server as well as your two frontline content servers. Figure 11-3 shows two network segments and the gateway machine that participates in both segments.
Testing Tier 1 WWW 1 Tier 2
Source
Gateway Tier 1 WWW 2 Tier 2
Figure 11-3. Multitiered deployments
241
6110_Ch11_FINAL
242
7/2/06
1:12 PM
Page 242
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Once you have initiated the deployment from the source machine, the first step in the deployment process is to deploy the files to the first-tier servers, which are the testing server and the gateway machine. The gateway machine and the source server both have an OpenDeploy base server installed. The testing server will require only a receiver because it will not be deploying files any further. You set the deployment configuration file up with a element that instructs the gateway node to redeploy the files. The gateway machine will look to its own local nodes.xml file to determine where to send the files to next, based on the deployment name specified in the deployment configuration from the source machine. The gateway machine will have a configuration file that matches this name passed from the first-tier deployment. This configuration file located on the gateway machine will specify that the WWW1 node and WWW2 node are the receiving nodes for the new deployment or the second-tier deployment. This second continuation of the first-tier deployment will now be executed. It is important to realize that the first-tier and second-tier deployments are considered to be the same deployment. Since they are considered one deployment, multitiered deployments can be set as transactional, but each tier will have to have the transactional flag set. If any one of the servers fails—Testing, WWW1, or WWW2—then the entire deployment on all tiers is rolled back.
Introducing Topologies You can configure OpenDeploy to perform deployments in any number of ways. Now we will cover the three commonly found topologies in order to show how you can configure your deployments based on each configuration. Remember, OpenDeploy is flexible, so do not limit your options to the choices we make in the following sections; this exercise will just help open your mind to the possible solutions.
Implementing a Star Topology You can set up a star topology, where the source server is actually in contact with each of the targets; Figure 11-4 shows this layout. The star topology allows your deployments to be configured simply.
Target 1
Target 2
Source Target 3
Figure 11-4. Star topology
Target 4
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 243
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Let’s assume you will be deploying the same content to each server every time you perform a deployment. Since you are deploying files to each machine, the element will contain four nodes. You will list each target in the replication farm set, and you will specify the deploy_to_all farm set within the deployment configuration file. This type of deployment is known as a fan-out deployment, in which one source machine and many targets are directly connected. Here’s the element for this type of deployment: With this topology you might need to deploy to a subset of these target servers. You will need to create a farm set for each server combination to which you would like to deploy files. Target 1 and Target 2 are backups for some content, and Target 3 and Target 4 are backups for other content. This will require you to single out each of these groups in your nodes.xml file. The following configuration example demonstrates this: Now that you have multiple farm sets, you can use a parameter inside the deployment configuration to specify the farm set you will be using for the deployment. You will also require that the target and source specifications be parameters as well.
Implementing Hub-and-Spoke Topology The hub-and-spoke topology is configured so that the source does not have direct access to the targets. To access the target servers, the source will have to go through a hub. The hub server has direct contact with a specific set of targets. One reason why you may see this is to cut
243
6110_Ch11_FINAL
244
7/2/06
1:12 PM
Page 244
CHAPTER 11 ■ DEPLOYING FILES AND DATA
down on the both the bandwidth and the deployment time by moving the content closer to the destination servers and then deploying to each of the targets from there. Figure 11-5 depicts how this would work.
Target
Target
Target
Target
Hub Target
Hub
Target Group A
Target Group B
Target
Source
Figure 11-5. Hub-and-spoke topology Two target groups appear in Figure 11-5. Target Group A is responsible for serving content to the Internet. The target servers are all part of the frontline content servers and are actually set up as a star topology from the hub to the targets. All the options you had under the star topology still apply for configuring the replication farms for the hubs, but you are not deploying the content from the hub. The hub will be the source for the second-tier deployment, and the source will start a multitier deployment using the hub as the gateway to reach its target nodes. The source can deploy content files to both target groups with the same deployment if that is required.
Implementing a Serial Topology You configure a serial topology in such a way that your deployments will have to hop from one server to the next until reaching the end. This deployment will take a longer time to complete because the entire deployment must happen three separate times. Each of the servers that will need to move the content to the next will also be required to have an OpenDeploy base component. Figure 11-6 illustrates how the deployments would have to take place. OpenDeploy Base
Source
OpenDeploy Base
OpenDeploy Base
Target 1
Target 2
OpenDeploy Receiver
Target 3
Figure 11-6. Serial topology This topology also allows Target 1 and Target 2 to initiate deployments to any of the other nodes. This configuration is also much more prone to failure, because if any of the servers are down, then the deployment cannot take place.
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 245
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Using Important Deployment Features You can use many deployment features; the following sections discuss some of the important ones. These features include configuring OpenDeploy to compare files, to use compression, to filter directories, to use symbolic links, and to deploy and run scripts.
Specifying Comparison Rules When performing comparisons between source areas and target areas, the modification date is the first criterion used to determine whether the file should be updated. If the source file modification date stored by the operating system is newer than the target, then the target is overwritten with the source. However, OpenDeploy also considers other factors to determine whether the file versions are the same. The following list shows each comparison that is performed and the order that OpenDeploy uses to determine whether the source and target files are different: • The type mismatch comparison determines whether the target side object and the source side object are either both directories or both files. If a file named stores was being deployed and a stores directory existed on the target, the time stamps of each may not be enough to determine that the objects are different. • The user difference comparison determines whether the user is the same on both the target and the source. If the owner is different, then the objects are different. This feature is supported only on the Unix operating system. • The group difference comparison determines whether the group is the same on both the target and source. If the group owners are different, the objects are different. This feature is supported only on the Unix operating system. • The permissions difference comparison determines whether the permissions are the same on both the target and source. If the permissions are different between the target and source, the objects are different. This feature is supported only on the Unix operating system. • The access control list (ACL) difference comparison determines whether the user and group are the same on both the source and the target. If either the owner or the group owner is different, then the objects are different. This feature is supported only on the Windows operating system and is turned off by default. • The size difference comparison determines whether the size is the same for both the source and the target. If there is a size difference between the source object and the target object, then the file is different. • The checksum difference comparison determines whether the checksum is the same for both the source and the target. If the checksum differs between the source and target object, the objects are different. This feature is turned off by default. These comparisons are followed unless specifically turned off, with one exception. If either of the source or the target operating systems cannot support a comparison type, then it is not used; it will be skipped.
245
6110_Ch11_FINAL
246
7/2/06
1:12 PM
Page 246
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Setting Up Compression If you are concerned that your files’ deployments are consuming too much network bandwidth, then you may want to compress your deployments. OpenDeploy will allow you to compress your deployments and set the level of compression. Something to consider when setting compression levels is that the higher the setting, the more CPU intensive the compression and decompression will be. Make sure you examine the impact on each of your servers after making these changes.
Using Filters When deploying files to a target, some of the files on the target system may not be included within the CMS. These files might have been placed on the target system or deployment destination by an application group or are legacy files that have not yet been brought into the CMS yet. These files will need to be protected to ensure that someone does not accidentally overwrite the files that fall outside the deployment. Filters are an effective way to eliminate the possibility of overwriting files. We have discussed using the basic file path as the source, but in addition to this, you can also use an element within a parent element. This feature allows the area to be further subdivided. If the parent area contains three directories, you can use the subpaths to remove a directory from the deployment. You can also specify the subpath using a regular expression pattern to match against subpaths. The deployment type also affects how the filter is applied. Four deployment types exist, and each one is handled in a slightly different way: • The directory comparison allows you to set up both a source-side and a target-side set of filters. When performing directory comparison deployments with the DoDeletes feature turned on, you need to make sure the source filter matches the target filter. If you filter files or objects on the source side only, OpenDeploy thinks those files do not exist and tries to delete the files on the target side. • The file list deployment comparison will take place only on the files that are listed in the deployment. • The payload adapter–based comparison will take place on only those files that are returned from the payload adapter. The resulting file list will either be deleted or be deployed over the old files depending upon the action specified in the configuration file. • The TeamSite comparison will take place only on the file paths that are returned from the two areas compared.
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 247
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Handling Symbolic Links OpenDeploy can handle symbolic links in two ways. The first is to deploy the file specified by the link to the same location on the target server. This allows you to transfer files from directories that are not normally included in the deployment’s source. The second way is to actually move the file being pointed to by the link to the new location of the link.
Creating Scripts to Deploy and Run For some of your deployments, it may be necessary to execute a script against the files that are being deployed. OpenDeploy allows you to run a script at specific points throughout the deployment process. Each of these points is set at a particular event. You can latch onto a number of events so that your script will run at the appropriate time. Figure 11-7 shows when each of these events is triggered. before deployment before connection after connection/before compare after compare/before transfer after transfer/before disconnection after disconnection after deployment after rollback
Figure 11-7. Deployment trigger events A script can run as a result of any one of these events, but the script that will be run must already exist on the server where it needs to be run before the deployment is initiated.
Using DataDeploy We have been discussing static deployments thus far, so we will now cover DataDeploy to address dynamic content. DataDeploy updates databases by moving structured content in the form of XML into databases. DataDeploy allows you to define table definitions based on your XML chunks by mapping specific elements to columns in your database. The XML file can be a custom-created file or it can be a FormsPublisher DCR. We’ll go through some of the basics of using DataDeploy throughout the rest of this chapter, and we will also discuss the benefits of integrating it with OpenDeploy.
247
6110_Ch11_FINAL
248
7/2/06
1:12 PM
Page 248
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Introducing Structured Data Data that has a defined structure using XML is structured data. DataDeploy can take this XML-defined data and turn it into SQL that in turn will be used to update a database. Structured content can be a data capture template type or plain XML files.
Introducing Schemas Deploying structured content to a database requires a rule to be defined in order for DataDeploy to know how to handle the structured content. This definition is called a schema and is defined within a configuration file. This configuration file is identified as a schema definition file and can be named anything you would like (except it must have a .cfg extension). If you have a data capture configuration file that needs to be inserted into a database, you will need to create a schema configuration file based on this specific capture record. In the case of a capture template, the system can generate the schema in addition to creating a user-defined schema. The system-generated schema in most cases is not optimized in such a way that it is very usable, but you can use it for simpler data capture forms. If you need to handle more complex capture forms, you should define custom schema configuration files.
Creating System-Generated Schemas You can have a database schema be created automatically by using command-line tools. This type of schema is called a wide table mapping. For each data capture form field within a FormsPublisher XML record, a column is created in the database; thus, you end up creating a very wide table. If the data capture form contains replicants, the column names are appended by a number incremented by 1 for each additional replicant. (If you need to refresh your memory about forms and replicants, refer to Chapter 8.) The problem with creating such a schema is that the number of columns that get created in the table can easily become unmanageable. When creating numerous columns, you run the risk of reaching the limits of the database for the maximum number of columns that are allowed. In most cases, you should create your own user-defined schemas; however, on basic forms or XML files, you can use the system-generated schemas.
Defining Custom Schemas The best solution to the problem of wide tables is to create a custom schema. A schema can be quite advanced in its definition, allowing table associations through primary keys. Creating your own schemas allows you to modify an existing application’s database. If you are not using existing XML data feeds, then you will want to create a data capture template that is representative of the content you need to enter. You can then create the schema based on the resulting content records. You define the schema configuration file using custom XML elements. To demonstrate how this works, we will create an XML schema definition, as shown in Listing 11-2. This schema describes how two tables will be used to contain data that is generated from a single data capture configuration. The data capture configuration has been built to capture data for a
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 249
CHAPTER 11 ■ DEPLOYING FILES AND DATA
baseball team’s roster. With this schema, say you want to create a master table that will contain columns for the team’s name and location. You will also create a child table that contains columns for each player’s name, number, and position. You will find that the schema is actually simple to create. Listing 11-2. Baseball Roster Schema
249
6110_Ch11_FINAL
250
7/2/06
1:12 PM
Page 250
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Note that the schema contains two elements. The two groups signify to DataDeploy that you will be creating two unique tables. The first group is named baseball_roster and will create a table with the same name. This table will be storing the team-level information. The baseball_roster group contains two columns. The first column will hold the data from the Team field from the capture form. The second column will hold the data from the Location field from the capture form. Each of these columns will be defined as type varchar with a length of 100 characters. Neither of these fields can be left as null. Take a look at Figure 11-8 to see the fields from the data capture template. Since the baseball_roster table is the master table, you need to define one of the fields as the primary key. You may have also noticed the presence of the element. The element defines a primary key or a foreign key. In this case, we have defined the Team column as the primary key. You can specify multiple columns if necessary, but it is not necessary in this case.
Figure 11-8. Baseball roster DCT The second group is named players. The players table consists of four columns. The first column will hold the foreign key data from the master table and will be named Team. It is defined just as it was in the baseball_roster table. The second column will contain the data from the Player Name field, which is a replicant. You should take a closer look at this definition; see how the [0-30] section is included in the field name? This means you can have up to 30 players on the data capture template. The third column will contain the data from the Player Position field and is also a replicant field. The fourth column will contain the data from the
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 251
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Player Position field from the data capture template. Each of the replication fields is from the same parent field, Player, so they all have the same definition of [0-30]; if there were multiple replications, each one might have been different. This is a fairly simple example of how you can create your own schema, but with a little bit of planning, you can capture data for any of your databases. This feature will allow you to replace some of those admin tools that are a pain to maintain and perhaps eliminate the need to create new admin tools in the future.
Introducing Database Autosynchronization You will always need to keep track of who is performing what actions and when are these actions being performed. Each time someone performs any action within TeamSite, an event fires. DataDeploy can deploy content such as structured data record content or extended attributes based on predetermined events. DAS updates can be triggered by such events as creating content, changing content, and deleting content from the TeamSite repository. Here are the events that trigger DAS updates: • When a branch is created • When a workarea is created • When a branch is deleted • When a workarea is deleted • When a data content record is modified • When a new content record is created • When a data content record is deleted • When a data content record is submitted • When a get latest on a workarea is performed • When a copy to area is performed • When a workarea is renamed • When a branch is renamed • When a directory is renamed • When a file is renamed • When a file is moved • When a file is deleted • When extended attributes are set • When a revert is performed
251
6110_Ch11_FINAL
252
7/2/06
1:12 PM
Page 252
CHAPTER 11 ■ DEPLOYING FILES AND DATA
DAS data deployments are effective in keeping a database’s content up-to-date, but you will need to make sure you do not deploy development content over your live content. If you have DAS deploy content files upon modification, these files may not have been approved for your live content site. DAS is effective at logging events and creating data that can be used for tracking down harmful events and accounting for the work that has been performed by your employees. DAS also is effective at populating databases that are being used for development such as for previewing a page before it is placed into production. It is recommended that you use manually fired database updates for your live site that can be triggered via an external task from a workflow. This way, you know that the content has been approved before it gets deployed.
Keeping Your Database Synchronized From time to time, your database can become out of sync with your content store; in that case, you will want to run a couple of commands to synchronize the database with the store. The first command is iwsyncdb –resyncwa to synchronize your workareas, and the second is iwsyncdb –resyncbr to synchronize your branches.
Virtualizing Your Database You can configure DAS to create a view that isolates changes between staging and its associated workareas. This view allows a query to run against the event tables and determine whether files have been modified in a workarea but have not yet been submitted to the TeamSite staging area. When creating custom schemas, the root table will automatically have a column that signifies this change state. You can also configure other tables within the user define schema in which to track the state. You will have an additional column in each of the tables if you configure this.
Introducing Database Deployments Deploying data to a database is much easier when you employ the services of DataDeploy. Once you have configured DataDeploy and created your schema, all you need to do is start feeding your XML files to DataDeploy for deployment. In the following sections, we will discuss the different types of deployments that you can make using DataDeploy as a stand-alone application and using it as integrated with OpenDeploy.
Implementing Stand-Alone Database Deployments Stand-alone database deployments update a database that resides on the same network as the database. With a stand-alone deployment, you can deploy structured content or extended attributes to the database at will without the assistance of any other servers. You can trigger the deployment on demand from the interface or through an external command such as a task within a workflow. Figure 11-9 illustrates that the source machine running DataDeploy can deploy directly to the database.
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 253
CHAPTER 11 ■ DEPLOYING FILES AND DATA
DataDeploy
Source
Database
Figure 11-9. Stand-alone deployment
Implementing OpenDeploy Triggered Deployments A useful feature of DataDeploy is that you can deploy structured content along with your normal content files. OpenDeploy has the ability to define a DataDeploy deployment in the OpenDeploy deployment configuration file. This helps you maintain your data integrity. A common use is to deploy extended attributes associated with the content files being deployed to a search engine. This way, the new search attributes can be populated at the same time the content is being moved to production. This will eliminate the chance of having your search engine return a page that has not been deployed to the site and thus create a broken link from your search results page. This can all be achieved because you can configure your deployment to be rolled back if any part of the deployment fails. If the content files are not successfully deployed, the database portion of the deployment will also be rolled back.
Implementing Target-Side Database Deployments Many times you will need to deploy structured content across network segments where communication is limited, such as deploying content from your staging environment to a production environment. Your content server does not have direct access to the final target but does have access to a gateway machine that bridges the source with the final target. This type of deployment is called a target-side deployment. Figure 11-10 shows how this can work.
Segment 1
Segment 2
Target B
Source
Target A OpenDeploy Receiver
OpenDeploy Base
Figure 11-10. Target-side deployment
253
6110_Ch11_FINAL
1:12 PM
Page 254
CHAPTER 11 ■ DEPLOYING FILES AND DATA
As you can see, the deployment source initiates the deployment sequence using the OpenDeploy base component. The Target A server receives the deployment and gets set to relay the structured content to the Target B server. The Target A server requires only the OpenDeploy receiver component. The Target B server does not need an OpenDeploy component installed—only the database server. The Target A server will then complete the deployment process by delivering the content from the source to the database on Target B.
Implementing Synchronized Deployments You can now deliver database content and content files in the same deployment as in Figure 11-11. This type of deployment is also referred to as a synchronized deployment. A synchronized deployment can deploy files with associated metadata, structured content for a shopping cart, and code for an application. DataDeploy
OpenDeploy Receiver
Content Files Structured Content
Source
Database
Figure 11-11. Structured content and content files
Using Advanced OpenDeploy Deployment Features Also, since this DataDeploy process can take advantage of OpenDeploy, you can utilize some advanced features such as data compression, encryption, and fan-out deployments. You can use a target-side deploy along with a fan-out deployment to move structured content and content files across network segments. Figure 11-12 shows that on the target side of the deployment a fan-out deployment will occur and update Target A and Target B with content files. The database will be updated with the structured content, and if there is a failure for any of the servers, the deployment can roll back every updated server.
OpenDeploy Base
OpenDeploy Base
Content Files
St r Co uctu nt re en d t
254
7/2/06
OpenDeploy Receiver
Database
Content Files
Structured Content
Target A
t en nt s Co File
Source
Target B OpenDeploy Receiver
Target C
Figure 11-12. Fan-out deployment with synchronized content
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 255
CHAPTER 11 ■ DEPLOYING FILES AND DATA
Summary You can simplify deploying your content by utilizing OpenDeploy and DataDeploy. These applications are powerful engines for deploying both your static content and your dynamic content. Stand-alone, these products function very well, but when integrated, you get much more power than the two products offer individually. You can perform synchronized deployments of both your dynamic and static content within a single deployment. You also have the ability to update servers when the target and source are separated by multiple networks by hopping through a gateway sever. When integrating your deployment solution with Interwoven Workflow, covered in the next chapter, you will have an end-to-end solution from creating the content to approving the content to deploying the content.
255
6110_Ch11_FINAL
7/2/06
1:12 PM
Page 256
6110_Ch12_FINAL
7/2/06
1:13 PM
CHAPTER
Page 257
12
■■■
Introducing Workflow W
orkflow systems can often be a CMS’s biggest selling point. Many companies have no way to enforce their content promotion process, so the workflow product is often their best hope of forcing developers to follow a more rigid process. In this chapter, you’ll get a handle on how you can turn your process into a workflow.
Understanding Workflow We’ll first discuss some terms to set the stage for the workflow discussion. Business process is the process the business needs to perform in order to achieve the end result. At this point, you are not interested in the actual end result, but you are interested in how the company reaches the end result. A workflow defines how the system will move documents through their life cycle, from creation to final deployment. When we discuss workflows, we are not talking about business flows. A great example of a workflow is the Interwoven Workflow product, which enforces the business processes. In other words, it defines the business process to the CMS. The best way to introduce workflows is to show a problem that they can solve. Let’s take a problem that FiCorp is facing with one of its intranet sites. FiCorp maintains a site that allows developers to share knowledge across development groups. The site is not organized really well, so even though the company is making a great effort toward collaboration and code reuse, it is still duplicating efforts. The content is getting put on the site, but the employees are all using disparate tools to deliver their content to the site. Each page on the site is formatted differently, and some pages have more useful information than others. You can solve this problem in many ways, but we are going to assume the company will be using TeamSite to help enforce some basic business rules. Here is a short list of business rules you want to enforce for this example: • There are two kinds of pages, or content types. The first is a code snippet, and the second is an application interface. • Each content type will have a different group of people who can approve additions or modifications to the content. • Once the approval has been made, TeamSite copies the content to the web server.
257
6110_Ch12_FINAL
258
7/2/06
1:13 PM
Page 258
CHAPTER 12 ■ INTRODUCING WORKFLOW
Figure 12-1 depicts a simple business flow diagram that describes the business process behind updating this developer resource website.
Business Process Start
End Code Snippet Approval Create Code Snippet or an API
Copy to Web Server
API Approval
Figure 12-1. Business process flow Figure 12-1 shows each step that needs to be performed to get the content to the web server. This is a simple process flow, but it could be a complex business flow. If you were to define this business process as a workflow, the individual steps should closely relate to the defined process. A workflow acts on individual steps, and in a workflow these steps are called workflow tasks. You can define a workflow task as one step or event that must be triggered in a predetermined sequence. You can determine this sequence by following the business process like the one defined in Figure 12-1. This workflow would then automatically submit the developer’s content to the appropriate reviewer. The reviewer would approve the content, and then that content would be copied to the appropriate location on the web server. A workflow implemented properly would take this mostly manual process and automate it, thereby reducing the time needed to complete the process.
KEEP IT SIMPLE You most likely have been told at some point to keep it simple (KIS). When people are learning something for the first time, they most often try to KIS. You need to apply that same principle to building workflows. Interwoven has put a large amount of effort into creating a system that allows complicated workflows. In some cases, this will be necessary, but in most cases you’ll want to keep your workflow as simple as possible. Your system can still be powerful, but if the users cannot figure out how to operate your workflow, they will not want to use the system. The entire idea of a CMS is to give users the power to change their own content. If you cannot make a system that a user can actually use, you have just wasted a lot of time and money. If you have two choices and you are not sure which way to go, just ask yourself which is easier for the user. You may have to put a little more effort into making it easy to use, but it will be worth it.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 259
CHAPTER 12 ■ INTRODUCING WORKFLOW
Building a Workflow In the following sections, we will show how to build a simple workflow. In this scenario, pretend you need a workflow that will send content for approval and allow the approver to reject or approve the content. The content will then be submitted to the staging area. You do not need to deploy the files to the server, because a scheduled deployment will synchronize an edition with the web server. (Chapter 11 demonstrated this portion of the process.)
Mapping Out a Workflow Model The first step is to map out how the workflow will look. You do this by creating a model of the business process you intend to automate. Figure 12-2 shows the flow for the example’s workflow. Rejected
Start Workflow
usertask
grouptask
submittask
Contribute
Approve
Submit
End Workflow
Figure 12-2. Basic workflow Figure 12-2 shows that you have one contributor task that will be reactivated if the approver rejects the content change. There is only one approval level, so there will be no legal approvals—just a quality assurance (QA) level of approval.
Creating a Workflow Template The next step is to create the workflow template using the model that you created. You need to decide what options you will allow the creator of the workflow to select from the workflow capture form.
Creating the Workflow Form To create a workflow, you first need the workflow form requirements, and you will extrapolate these during your meetings with the business owner. The requirements you will be using in this chapter are as follows: • You need to be able to choose from a drop-down box the contributor who you want to make the content change. • You need to be able to set the due date value that is displayed in the workflow interface upon initiation of the workflow. • You need to be able to enter a workflow description.
259
6110_Ch12_FINAL
260
7/2/06
1:13 PM
Page 260
CHAPTER 12 ■ INTRODUCING WORKFLOW
The following code is what you will be using to define the form. This does not include validation for the due date, but you could ensure the date was typed in correctly by using the validate property. # Define the workflow capture form TAG_info( contributor => [ html => "" . &build_cont_opt() ."", label => "Select a Contributor", is_required => "true", ], iw_setwfarg_due_date => [ html => "", label => "Enter Due Date (MM/DD/YYYY)", ], description => [ html => "", label => "Enter Job Description", is_required => "true", ], ); The previous code will render a form that allows you to enter the data that is required, as shown in Figure 12-3. There are three data fields to allow the user to enter the desired information.
Figure 12-3. Workflow form In Figure 12-3 you will also notice that the required fields are denoted with asterisks.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 261
CHAPTER 12 ■ INTRODUCING WORKFLOW
Creating the Workflow Specification Now that you have the form in place, you need to start working on the skeleton of the workflow specification file within the workflow template. The first element you need is the opening element. The following code will provide this opening element: With the element, you are using several of the techniques that can define values within the workflow template. You can simply hard-code the name attribute while you are assigning the iw_user value that is provided by the workflow system. The iw_user value represents the user who is creating the workflow. For this example, you happen to want both the owner and the creator to be the person who initiates the workflow. The description is being handled with the __INSERT__ directive because you have defined the Perl variable $description and populated it with a value entered by the creator of the workflow. You use the __VALUE__ directive with the optional HTML encoding to ensure that the user’s input is HTML compliant. The following code shows how to assign the description within a element: # The description tag may have illegal characters my $description = __VALUE__('description','html'); According to the earlier workflow model, the first task that should be run is the contribute task. First you need to define the element. As you can see in the following code, you can include the start attribute with the value of t to indicate this task should start when the workflow is activated: The next step is to define the TeamSite workarea that you will be using for your work. You can use the element to do this: You have to tell the workflow system which tasks can be called from this task. With the user task, you can activate only the approval task, so as you can see, you have only one element:
261
6110_Ch12_FINAL
262
7/2/06
1:13 PM
Page 262
CHAPTER 12 ■ INTRODUCING WORKFLOW
Since you want to be able to attach files that are passed to the workflow from the interface, you include the following section of code. You are looping through the files that were selected when the submit button was clicked and adding them to the task. You could make this more flexible by making sure there are files being passed in and adding this section only if files are present. This code as it is will fail if there are no files being passed from the interface because the element must have at least one file within it. After the section, you close the user task. The next task that you have is the approval task, which is set up as a group task. The group task here allows you to assign the task to an entire group of users so that anyone who has time to approve this job can take it and make the approval. The group task has only a few differences from the user task. The group task has two additional attributes: retainowner and readonly. The retainonwer attribute ensures that the same approver gets the task if the content has been rejected and reworked by the contributor. In other words, if the task is rejected, returned to the contributor for correction, and then returned for a second approval, then this second approval will be routed directly to the original approver who rejected the work the first time. The readonly attribute ensures that the approver cannot add, remove, or update any files that are attached to the workflow. The group task is the only task that contains a element. This allows the group to be shared by many people. In this case, you have assigned the task to the administrators group. Anyone who is a member of the administrators group will be able to take the task. Once someone takes the task, only that user can work on it from then on unless it is reassigned. The code for the group task is as follows:
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 263
CHAPTER 12 ■ INTRODUCING WORKFLOW
The next task in the workflow template is the submit task. You use this task to submit files to the TeamSite staging area. For this workflow, this will be the last step before the end of the workflow. The submit task is straightforward, as you can see. One thing that is worth mentioning is that the submit task does not have a element. This is because the submit task can have only one element, thus allowing only one group of tasks in which it can transition: The last task in the workflow template is the end task. It is the most basic of tasks, as you can see:
Configuring a Workflow to Run When you add a new workflow, you have to configure it within the available_templates.cfg file located in the iw-home/local/config/wft directory. The following code shows the section that configures the example workflow. The name attribute should be something that makes sense to the person instantiating the workflow because this will be displayed to the user, and the user will be asked to select from all the available workflows that the user has available to them. The path can be absolute, but in this case is relative to the iw-home/local/config/wft directory. The element instructs the interface where this workflow can be started from, such as from the Submit button or the New Job button. In this case, you want it to start from the submit button only. The section allows you to give permission to different roles and specific users. In this case, you will be allowing any user to start the workflow.
263
6110_Ch12_FINAL
264
7/2/06
1:13 PM
Page 264
CHAPTER 12 ■ INTRODUCING WORKFLOW
Now that you have placed this in the configuration file, you can run the workflow and see what happens.
Creating the Job Specification Output A job specification file is not the actual job instance, but the job instance will be built from this specification. The specification that you have included could be initialized as many times as needed, but this is not that useful unless you have an identical job that runs periodically. For instance, if you need to deploy the same file on a monthly basis and the process is identical, you could use this same job specification each month without re-creating it. The workflow template that you create has generated this specification file, but you could type it in yourself, as shown here:
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 265
CHAPTER 12 ■ INTRODUCING WORKFLOW
■Tip When you try to start your workflows for the first time, you may get an error telling you that the XML is not well-formed. The line number that is returned will not make sense to you because it is actually referring to the job specification, not the workflow template. You can add the iw_debug_mode => "", line to your TAG_info tag and by doing so make this value true. This will make the specification file display instead of the workflow system trying to instantiate the job. You can then take the specification portion (the bottom part) of the debug results, paste it into a text editor, and find the specific line that is throwing the error.
Examining Workflow Template Components The following sections describe several workflow template components that interact with each other to complete the entire workflow system. This system is a mix of different technologies, and with the description of each component, you will start to see the entire picture.
265
6110_Ch12_FINAL
266
7/2/06
1:13 PM
Page 266
CHAPTER 12 ■ INTRODUCING WORKFLOW
Using the Workflow Template A workflow template is a mix of XML and Perl that together can create a job specification file dynamically. You can then turn the job specification into as many workflow jobs as needed. The workflow system allows the Perl code and other template elements to interact through specialized directives.
Setting Up the template_script Element The element allows you to put Perl code directly in the workflow template. Perl is what gives the workflow template dynamic flexibility. The code within this element must be surrounded by CDATA tags. Take a look at the following example to see how this tag works: new("workflow"); my %args = $clibs->processArgs(); … # Define subroutine sub readData ($) { my $filename = shift || ""; … } } ]]> You can have many of these elements within your workflow as you need, and when the template is processed, the code from each of these scripts merges to make one script. As you can see in the example, you can apply standard Perl script techniques, including using custom libraries and subroutines. You can use other custom tags within the element; these will be described throughout this chapter.
Using the CGI_info() Directive You use the CGI_info() directive to change the workflow QA form’s look and feel; you must use it within a element. You can set the following properties using this directive: • The error_data_bgcolor property defines the background color for a data input field that will display when an invalid value is entered. You can define the color value using the name of the color or the hex value for that color. • The error_label_bgcolor property defines the background color of the label for a data field that has had invalid data entered and then submitted. You can define the color value using the name of the color or the hex value for that color.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 267
CHAPTER 12 ■ INTRODUCING WORKFLOW
• The error_text_color property defines the color value of the error message text. You can define the color value using the name of the color or the hex value for that color. • The valid_bgcolor property defines the background color used when data that is entered is valid. • The title property defines the title of the browser window. • The html_body_options property sets the HTML tag attributes such as bgcolor and onload. When the page is rendered, these attributes will be inserted into the actual tag of the page. • The tag_table_options property sets the attributes of the table used to display the form. You can set attributes such as border and cellpadding for this table. • The pre_tagtable_html property sets what displays directly before the
tag holding the form. • The post_tagtable_html property sets what displays directly after the
tag holding the form. Here’s an example of the directive: CGI_info ( title => "Custom Workflow", pre_tagtable_html => "Please double-check your answers");
Using the TAG_info() Directive The TAG_info() directive defines the form tags or fields that will be used to collect any information needed to create the workflow. If you define your fields properly, your workflows can be very dynamic. You can have many TAG_info() directives in your workflow template, and they will all be combined when the input screen is presented to the contributor. HTML Definition You define the form fields using a hash array that is passed to the TAG_info directive. The first way is much easier to define and uses only HTML to define the field. Take a look at the following code example to see what we mean: # HTML-only definition TAG_info ( name => "", phone_number => "", ); In the code example, a name field and a phone number field are defined. The default value for the phone number field is set to the local area code, and the capture field type for the phone number will be a text box with a width of 30 pixels. When this example is rendered, two field/label capture combinations will display. The first will have a label of name and the other a label of phone_number; both will have a width of 30 pixels.
267
6110_Ch12_FINAL
268
7/2/06
1:13 PM
Page 268
CHAPTER 12 ■ INTRODUCING WORKFLOW
Advanced Definition The advanced definition allows the developer to assign additional properties to each capture field; for instance, the text for the field label will define whether the field is required. The following code example demonstrates how this looks: # Advanced definition with validation TAG_info( phone_number => [ html => "", label => 'Home Phone Number (###)###-####', valid_input => '/^([0-9][0-9][0-9])[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$/', err_msg => 'Enter the number in this form (###)###-#### ], name => "", ); In the code example, the same phone number field is created, but you have added a label and some regular expression syntax to perform validation on the format. If the expression does not fit the data input, the error message will appear, and the contributor will have to reenter the phone number. This example uses the name field again to demonstrate that you can use both definition types on the same form without any problems. You can set the following properties for each capture field: • The html property is the actual HTML that will be used for the capture field. This property is required. • The is_required property when set to true instructs the workflow system to require that this field be filled out. The default for this property is true and can be disabled by specifying the false value. • The valid_input property defines a regular expression to validate the captured data or a FormAPI function to call with the captured data for validation. • The error_msg property defines the message that will be displayed when the data entered is not valid. The default message is “Valid input required.” • The label property defines the text of the label that is displayed next to the capture field.
■Caution A good rule when creating your input form for your workflow is to make sure you test for values that absolutely have to be exact but to be careful not to limit input so that a new workflow has to be created for every use case. The business owner will over time remove most of the validation they thought they needed in the beginning; this is because they will realize they have made it too stringent. Make sure you challenge any unusual requests for validation and remind the user they will have to adhere to this validation each time the workflow is instantiated.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 269
CHAPTER 12 ■ INTRODUCING WORKFLOW
iw_setwfarg_ When capturing input from the workflow form, sometimes you will need to set a workflow variable. The data being captured will not be used until the workflow has been activated and the workflow has reached a specific task. You can add the iw_setwfarg_ prefix to any tag/field name inside the TAG_info declaration, and the workflow system will create a workflow variable with the value entered in this field. Let’s take the example from earlier and have the system create a workflow variable from it: # Set name as workflow variable TAG_info ( iw_setwfarg_name => "", ) You can see how the prefix is added to the name field, and when the input is processed, a workflow variable will be created with the reference name. No additional code is required to store the value collected in the name workflow variable.
Using the __ELEM__() Directive You can use the __ELEM__() directive to determine how many data elements are associated with a tag/field. This directive is most often used to determine whether the field has been defined. The value returned for an undefined field is 0. You can also use this tag with a multiselect box to indicate the number of selections that have been made. See the following code example to see how to use this directive: # Returns number of items __ELEM__('items');
Using the __TAG__() Directive This __TAG__() directive reads the POST/GET data from the query string and outputs the value for the specified variable name in the job specification file. This directive can return values from the capture form or default data that is provided by the system. This directive can output a single value into the job specification file or can output a value from an array if more than one data element has been associated with the variable that is named. Take a look at the following code example to see how this works: # Returns the phone number # Returns the second item selected
269
6110_Ch12_FINAL
270
7/2/06
1:13 PM
Page 270
CHAPTER 12 ■ INTRODUCING WORKFLOW
The system provides a set of values by default to the workflow template based on whether the workflow template is called by a submit button action or a new job action. The following variables are provided when clicking the submit button to start a workflow: • The iw_areaowner variable is the owner of the current workarea. • The iw_branch variable is a virtual path within the repository to the current branch. • The iw_home variable is the base path location where TeamSite is installed. • The iw_role variable is the role that the current user has signed into the system as. • The iw_session variable is a string value that the system uses to keep track of who the user is. • The iw_template_file variable is the relative path of the current workflow template. The path is relative to iw-home/local/config/wft. • The iw_template_name variable is the name that will be displayed in the TeamSite GUI for the current template. • The iw_use_default variable, if enabled, and all workflow form variables have defaults; the defaults are used to create the job, and the form is not shown. • The iw_user variable is the current user’s system user ID. • The iw_workarea variable is the current workarea’s virtual path within the repository. The following variables are provided when a user clicks the New Job button. Many of these variables are the same, but some are different. You should be careful when choosing which method you will use to start the workflow: • The iw_home variable is the base path location where TeamSite is installed. • The iw_role variable is the role that the current user has signed into the system as. • The iw_session variable is a string value that the system uses to keep track of who the user is. • The iw_template_file variable is the relative path of the current workflow template. The path is relative to iw-home/local/config/wft. • The iw_template_name variable is the name that will be displayed in the TeamSite GUI for the current template. • The iw_user variable is the current user’s system user ID.
Using the __INSERT__($variable) Directive The __INSERT__($variable) directive places the output of the variable into the job specification file. This directive is most often used to output parts or even entire workflow tasks that have been generated dynamically; however, you can use this directive anywhere within the workflow template. In the following code example, it has been used in a couple of ways:
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 271
CHAPTER 12 ■ INTRODUCING WORKFLOW
# Use it with template script # Use it within workflow task
Using the __VALUE__($tag, $encoding) Directive This directive is used within a element to return the value from the tag or field that is named. The raw data is returned unless the optional encoding value is set to html. When set to html, the value is returned as an HTML-encoded value. Unlike the __TAG__ directive, the value is not output into the job specification file. This directive can return a scalar or a list, but if the field is not defined, then undef will be returned. See the following code example for an example: Here’s the breakdown of the attributes: • The name attribute specifies the name/reference for this task. • The owner attribute specifies the user who owns the right to work on this task. • The start attribute determines whether the task should activate when the workflow is created. A value of t tells the task to activate when the workflow starts, and f instructs the task to not become active upon the creation of the workflow. The default value is f. • The description attribute allows the task purpose to be defined. • The lock attribute specifies whether the files should be locked upon the execution of this task. A value of t indicates that it should be locked, and a value of f indicates to not try to lock the files. The default value is f. • The transferonly attribute instructs the task to obtain locks for only those files that are locked by the owner of the task and the owner of predecessor tasks. The default value is f. • The immediate attribute specifies that this task should automatically start upon the activation of this task. This works only if the previous task is owned by the same person and the task is a user or group task. The default value is f. • The readonly attribute specifies that this task will not be able to update the files that are attached to the workflow. Also, no new files can be added to the workflow. The default value is f.
Using the Dummy Task A dummy task offers no real value, except as a placeholder. In other words, this is the dummy task that is often used to hold the workflow at a certain point using a timer. If you needed the workflow to pause until the 13th, for example, you could trigger a dummy task with a timer.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 273
CHAPTER 12 ■ INTRODUCING WORKFLOW
Once the timer has elapsed, the dummy task would then transition to the appropriate task, and the workflow could continue along its next sequence of events. Here’s the DTD definition for the task: Here’s the breakdown of the attributes: • The name attribute specifies the name/reference for this task. • The start attribute determines whether the task should activate when the workflow is created. A value of t tells the task to activate when the workflow starts, and f instructs the task to not become active upon the creation of the workflow. The default value is f. • The description attribute allows the task purpose to be defined.
Using the Email Task The email task is useful for notifying users that a task needs their attention. Most implementations rely heavily on this type of notification. Some authors work inside the CMS all day, and notifications are not that important. One of the most time-consuming parts of your workflows will be when your content is awaiting approval. The people responsible for reviewing content are usually your businesspeople, and they do not spend their day working in TeamSite. These people are your senior employees. They have most likely been with your company for many years. You need their experience to review something, but you do not want to waste all their time. Email works great for these people, and some implementations have set up ways for this kind of contributor to interface with TeamSite through their email. This kind of contributor would be described as a casual contributor. Here are the attributes you can use with the email task: • The name attribute specifies the name/reference for this task. • The start attribute determines whether the task should activate when the workflow is created. A value of t tells the task to activate when the workflow starts, and f instructs the task to not become active upon the creation of the workflow. The default is f. • The description attribute allows the task purpose to be defined. • The lock attribute specifies whether the files should be locked upon execution of this task. A value of t indicates that it should be locked, and a value of f indicates to not try to lock the files. The default value is f. • The owner attribute specifies the user who owns the right to work on this task. • The retry attribute indicates whether the message that is sent should be re-sent if the recipient does not receive the email. A value of yes indicates that the message should be re-sent, and a value of no indicates not to resend the email. The default value is yes.
273
6110_Ch12_FINAL
274
7/2/06
1:13 PM
Page 274
CHAPTER 12 ■ INTRODUCING WORKFLOW
■Note Sometimes a dummy task is used in conjunction with a timer and an email task. When the workflow is transitioned to the email task, it is then transitioned to the dummy task, and the timer is initiated. This is useful for determining escalation paths. If a task is sitting in someone’s task listing and they have not responded, the CMS can email the unresponsiveness issue to the appropriate escalation person.
Using the End Task The end task is the task that ends the workflow. There are no other special functions for this task other than to end the workflow. Here’s the DTD definition for the task: Here’s the breakdown of the attributes: • The name attribute specifies the name/reference for this task. • The description attribute allows the task purpose to be defined.
EXTERNAL LOGGING TASKS One of the more difficult tasks in system administration is keeping track of what is going on with your workflows. This is why it is important to log as often as possible. You may need to insert external tasks to perform logging at key times during your workflow. This will aid in debugging any problems that arise. You will spend some time tracking down why a workflow disappeared or why your content didn’t make it to its destination. If you have been logging throughout your workflow, it should be easy to determine what has gone wrong and resolve the issue in a timely manner. Although logging is a wonderful feature, you should utilize it wisely. When you are creating your logging routines, provide the ability to set global-level verbosity. In other words, you should be able to change the level of login based on a global configuration file. This allows you to turn down logging when the system is performing properly and turn up the logging level when you are trying to debug a workflow.
Using the External Task An external task allows the workflow system to tie into external programs that have access to the workflow system for information specific to this process of the workflow. These programs are usually written in Perl and can perform any task that can be performed through a normal Perl program. A huge use for the external task is to interface between current systems and the Interwoven application. Figure 12-4 shows how the external program interacts with the workflow.
6110_Ch12_FINAL
7/2/06
1:13 PM
Page 275
CHAPTER 12 ■ INTRODUCING WORKFLOW
Initialize Workflow
Intialize External Task
Task 1
External Task
0
Task 3
1 Start Program
Data
Call Back
Task 4
External Task Runs
Figure 12-4. External task sequence As you can see in Figure 12-4, the external task starts a custom script. The external script will be passed some information to allow it to query the workflow system for additional information such as file lists. The external script can now operate on this data, pull information from external sources, or just follow a simple procedure. Once the script is done with what it needs to do, it must then call back to the workflow subsystem and signify that it is time for the workflow to move forward. When the script calls back to the workflow system, it will also need to tell the workflow which task to go to next. These external scripts allow you to be creative, and they provide an easy way to interface with external systems. Here’s the DTD definition for the task: