What is Data Warehouse?
Ø Defined in many different ways,
but not rigorously
- A
decision support database that is maintained separately from the organization’s
operational database.
- A
consistent database source that bring together information from multiple
sources for decision support queries.
- Support
information processing by providing a solid platform of consolidated,
historical data for analysis.
History of Data Warehousing
Ø In the 1990’s executives became
less concerned with the day-to-day business operations and more concerned with
overall business functions
Ø The data warehouse provided the
ability to support decision making without disrupting the day-to-day
operations, because;
- Operational
information is mainly current – does not include the history for better
decision making
- Issues
of quality information
- Without
information history, it is difficult to tell how and why things change over time
Data warehouse fundamentals
Ø Data warehouse – A logical
collection of information – gathered from many different operational databases
– that supports business analysis activities and decision-making takes
Ø The primary purpose of a data
warehouse is to combined information throughout an organization into a single
repository for decision-making purposes – data warehouse support only
analytical processing
Data warehouse model
Ø Extraction, transformation and
loading (ETL) – A process that extracts information from internal and external
databases, transforms the information using a common set of enterprise
definitions, and loads the information into a data warehouse.
Ø Data warehouse then send subsets
of the information to data mart.
Ø Data mart – contains a subset of
data warehouse information.
Multidimensional Analysis and Data Mining
Ø Relational Database contains
information in a series of two-dimensional tables.
Ø In a data warehouse and data mart,
information is multidimensional, it contains layers of columns and rows
- Dimension
– A particular attribute of information
Ø Cube – common term for the
representation of multidimensional information
Ø Once a cube of information is
created, users can begin to slice and dice the cube to drill down into the
information.
Ø Users can analyze information in a
number of different ways and with number of different dimensions.
Ø Data Mining – the process of
analyzing data to extract information not offered by the raw data alone. Also
known as “knowledge discovery” – computer-assisted tools and techniques for
sifting through and analyzing vast data stores in order to finds trends, patterns
and correlations that can guide decision making and increase understanding
Ø To perform data mining users need
data-mining tools
- Data-mining
tool – uses a variety of techniques to finds patterns and relationships in
large volumes of information. Eg: retailers and use knowledge of these patterns
to improve the placement of items in the layout of a mail-order catalog page or
Web page.
Information Cleansing or Scrubbing
Ø An organization must maintain
high-quality data in the data warehouse
Ø Information cleansing or scrubbing
– A process that weeds out and fixes or discards inconsistent, incorrect or
incomplete information
Ø Occurs during ETL process and
second on the information once if is in the data warehouse
Ø Contract information in an
operational system
Ø Standardizing Customer name
from Operational Systems
Ø Information cleansing activities
- Missing
Records or Attributes
- Redundant
Records
- Missing
Keys or Other Required Data
- Erroneous
Relationships or References
- Inaccurate
Data
Ø Accurate and complete information
Business Intelligence
Ø Business Intelligence – refers to
applications and technologies that are used to gather, provides access, analyze
data and information to support decision making efforts
Ø These systems will illustrate
business intelligence in the areas of customer profiling, customer support,
market research, market segmentation, product profitability, statistical
analysis, and inventory and distribution analysis to name a few
Ø Eg; Excel, Access
Question?
1. Describe
the roles and purposes of data warehouse and data marts in an organization
2. Compare
the multidimensional nature of data warehouses (and data marts) with the
two-dimensional nature of databases
3. Identify
the information of ensuring the cleanliness of information throughout an
organization
4. Explain
the relationship between business intelligence and a data warehouse
No comments:
Post a Comment