Building the Data Warehouse
The new edition of the classic bestseller that launched the data warehousing industry covers new approaches and technologies, many of which have been pioneered by Inmon himself In addition to explaining the fundamentals of data warehouse systems, the book covers new topics such as methods for handling unstructured data in a data warehouse and storing data across multiple storage media Discusses the pros and cons of relational versus multidimensional design and how to measure return on investment in planning data warehouse projects Covers advanced topics, including data monitoring and testing Although the book includes an extra 100 pages worth of valuable content, the price has actually been reduced from $65 to $55
Building the Data Warehouse
The data warehousing bible updated for the new millennium Updated and expanded to reflect the many technological advances occurring since the previous edition, this latest edition of the data warehousing "bible" provides a comprehensive introduction to building data marts, operational data stores, the Corporate Information Factory, exploration warehouses, and Web-enabled warehouses. Written by the father of the data warehouse concept, the book also reviews the unique requirements for supporting e-business and explores various ways in which the traditional data warehouse can be integrated with new technologies to provide enhanced customer service, sales, and support-both online and offline-including near-line data storage techniques.
BUILDING THE DATA WAREHOUSE 4th Ed
Market_Desc: · IT, Database, and Data Warehouse Managers and Developers Special Features: · Building the Data Warehouse has sold nearly 40,000 copies in its first 3 editions· Inmon is widely recognized as the Father of the Data Warehouse and remains one of the two leading authorities in the industry he helped to invent· The new edition covers new approaches and technologies, many of which have been pioneered by Inmon himself· Price of this new edition will be reduced from $65 to $55, and 100 new pages added About The Book: This book provides a high-level, conceptual overview of the major components of data warehouse systems, as well as the core approaches used to design and build data warehouses. Topics covered in this book are methods for handling unstructured data in a data warehouse, storing data across multiple storage media, the pros and cons of relational vs. multidimensional design, data monitoring and testing.
Building the Unstructured Data Warehouse
Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now! Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text. Master these ten objectives: • Build an unstructured data warehouse using the 11-step approach • Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure • Overcome challenges including blather, the Tower of Babel, and lack of natural relationships • Avoid the Data Junkyard and combat the “Spider’s Web” • Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative development • Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement • Design the Document Inventory system and link unstructured text to structured data • Leverage indexes for efficient text analysis and taxonomies for useful external categorization • Manage large volumes of data using advanced techniques such as backward pointers • Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances The following outline briefly describes each chapter’s content: • Chapter 1 defines unstructured data and explains why text is the main focus of this book. The sources for text, including documents, email, and spreadsheets, are described in terms of factors such as homogeneity, relevance, and structure. • Chapter 2 addresses the challenges one faces when managing unstructured data. These challenges include volume, blather, the Tower of Babel, spelling, and lack of natural relationships. Learn how to avoid a data junkyard, which occurs when unstructured data is not properly integrated into the data warehouse. This chapter emphasizes the importance of storing integrated unstructured data in a relational structure. We are cautioned on both the commonality and dangers associated with text based on paper. • Chapter 3 begins with a timeline of applications, highlighting their evolution over the decades. Eventually, powerful yet siloed applications created a “spider’s web” environment. This chapter describes how data warehouses solved many problems, including the creation of corporate data, the ability to get out of the maintenance backlog conundrum, and greater data integrity and data accessibility. There were problems, however, with the data warehouse that were addressed in Data Warehouse 2.0 (DW 2.0), such as the inevitable data lifecycle. This chapter discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development. • Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL). This chapter has separate sections on extracting text, transforming text, and loading text. The chapter emphasizes the issues around source data. There are a wide variety of sources, and each of the sources has its own set of considerations. Extracting pointers are provided, such as reading documents only once and recognizing common and different file types. Transforming text requires addressing many considerations discussed in this chapter, including phrase recognition, stop word filtering, and synonym replacement. Loading text is the final step. There are important points to understand here, too, that are explained in this chapter, such as the importance of the thematic approach and knowing how to handle large volumes of data. Two ETL examples are provided, one on email and one on spreadsheets. • Chapter 5 describes the 11 steps required to develop the unstructured data warehouse. The methodology explained in this chapter is a combination of both traditional system development lifecycle and spiral approaches. • Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value. The Document Inventory is discussed, which is similar to a library card catalog used for organizing corporate documents. This chapter explores ways of linking unstructured text to structured data. The emphasis is on taking unstructured data and reducing it into a form of data that is structured. Related concepts to linking, such as probabilistic linkages and dynamic linkages, are discussed. • Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes. • Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse. Both simple and complicated taxonomies are discussed. Techniques to help the reader leverage taxonomies, including using preferred taxonomies, external categorization, and cluster analysis are described. Real world problems are raised, including the possibilities of encountering hierarchies, multiple types, and recursion. The chapter ends with a discussion comparing a taxonomy with a data model. • Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important. Ways of reducing the amount of data are presented, including screening and removing extraneous data, as well as parallelizing the workload. • Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. The traditional data warehouse processing technology is reviewed. In addition, the data warehouse appliance is discussed. • Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies: the Ablatz Medical Group, the Eastern Hills Oil Company, and the Amber Oil Company.
Building a Scalable Data Warehouse with Data Vault 2 0
The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse Demystifies data vault modeling with beginning, intermediate, and advanced techniques Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0
The Data Webhouse Toolkit
"Ralph's latest book ushers in the second wave of the Internet. . . . Bottom line, this book provides the insight to help companies combine Internet-based business intelligence with the bounty of customer data generated from the internet."--William Schmarzo, Director World Wide Solutions, Sales, and Marketing,IBM NUMA-Q. Receiving over 100 million hits a day, the most popular commercial Websites have an excellent opportunity to collect valuable customer data that can help create better service and improve sales. Companies can use this information to determine buying habits, provide customers with recommendations on new products, and much more. Unfortunately, many companies fail to take full advantage of this deluge of information because they lack the necessary resources to effectively analyze it. In this groundbreaking guide, data warehousing's bestselling author, Ralph Kimball, introduces readers to the Data Webhouse--the marriage of the data warehouse and the Web. If designed and deployed correctly, the Webhouse can become the linchpin of the modern, customer-focused company, providing competitive information essential to managers and strategic decision makers. In this book, Dr. Kimball explains the key elements of the Webhouse and provides detailed guidelines for designing, building, and managing the Webhouse. The results are a business better positioned to stay healthy and competitive. In this book, you'll learn methods for: - Tracking Website user actions - Determining whether a customer is about to switch to a competitor - Determining whether a particular Web ad is working - Capturing data points about customer behavior - Designing the Website to support Webhousing - Building clickstream datamarts - Designing the Webhouse user interface - Managing and scaling the Webhouse The companion Website at www.wiley.com/compbooks/kimball provides updates on Webhouse technologies and techniques, as well as links to related sites and resources.
The Data Warehouse Toolkit
Updated new edition of Ralph Kimball's groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimball's The Data Warehouse Toolkit introduced the industry to dimensional modeling, and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more. Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence Begins with fundamental design recommendations and progresses through increasingly complex scenarios Presents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting, customer relationship management, big data analytics, and more Draws real-world case studies from a variety of industries, including retail sales, financial services, telecommunications, education, health care, insurance, e-commerce, and more Design dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.
Building a Better Data Warehouse
Just do it! Cut through the hype and get that data warehouse deployed! Based on Meyer and Cannon's extensive practical experience "Building a Better Data Warehouse" is a systematic guide to successful data warehouse deployment. It cuts through the hype, briefing managers on exactly what to expect building, piloting, deploying and maintaining a data warehouse. You'll learn how to take control of the process from start to finish-and discover the key success factors associated with data warehouses that deliver real business benefits. Understand the unique issues surrounding OLAP applications Compare data warehouses and data marts Plan your goals, architecture, infrastructure, platforms and tools Build the data model and the physical model Optimize performance, security and end-user access Maximize data integrity Meyer and Cannon pull no punches. They offer specific guidance for every critical decision, including hardware platforms, operating systems, databases tools and applications. They also provide comprehensive advice for both data modelers and DBAs, including proven techniques for completing data model deliverables and constructing the data warehouse. Building a Better Data Warehouse covers metadata, extraction programming, populating the data warehouse, end-user access tools, training, and much more. It's the one book every member of your data warehouse team should read.
Building Using and Managing the Data Warehouse
When it comes to making organizations smarter, faster, and more competitive, few technologies have more promise than data warehousing. This book shows you how to translate that promise into reality.
The Data Warehouse ETL Toolkit
Cowritten by Ralph Kimball, the world's leading data warehousing authority, whose previous books have sold more than 150,000 copies Delivers real-world solutions for the most time- and labor-intensive portion of data warehousing-data staging, or the extract, transform, load (ETL) process Delineates best practices for extracting data from scattered sources, removing redundant and inaccurate data, transforming the remaining data into correctly formatted data structures, and then loading the end product into the data warehouse Offers proven time-saving ETL techniques, comprehensive guidance on building dimensional structures, and crucial advice on ensuring data quality