Interview Questions and Answers
What's A Data warehouse
A Data warehouse is a repository of integrated information, available for
queries and analysis. Data and information are extracted from heterogeneous
sources as they are generated. This makes it much easier and more efficient to
run queries over data that originally came from different sources". Another
definition for data warehouse is: " A data warehouse is a logical collection of
information gathered from many different operational databases used to create
business intelligence that supports business analysis activities and
decision-making tasks, primarily, a record of an enterprise's past transactional
and operational information, stored in a database designed to favour efficient
data analysis and reporting (especially OLAP)". Generally, data warehousing is
not meant for current "live" data, although 'virtual' or 'point-to-point' data
warehouses can access operational data. A 'real' data warehouse is generally
preferred to a virtual DW because stored data has been validated and is set up
to provide reliable results to common types of queries used in a business.
Data Warehouse is a repository of integrated information, available for queries
and analysis. Data and information are extracted from heterogeneous sources as
they are generated....This makes it much easier and more efficient to run
queries over data that originally came from different sources.
Typical relational databases are designed for on-line transactional processing (OLTP)
and do not meet the requirements for effective on-line analytical processing (OLAP).
As a result, data warehouses are designed differently than traditional
What is ODS?
1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from operation
databases and standardized, cleansed, consolidated, transformed, and loaded into
an enterprise data architecture. An ODS is used to support data mining of
operational data, or as the store for base data that is summarized for a data
warehouse. The ODS may also be used to audit the data warehouse to assure
summarized and derived data is calculated properly. The ODS may further become
the enterprise shared operational database, allowing operational systems that
are being reengineered to use the ODS as there operation databases.
What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which
the user can drill down and drill up. it contains only the textual attributes.
What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the
lookup is placed on the target table (fact table / warehouse) based upon the
primary key of the target, it just updates the table by allowing only new
records or updated records based on the lookup condition.
Why should you put your data warehouse on a different system than your
A OLTP system is basically " data oriented " (ER model) and not " Subject
oriented "(Dimensional Model) .That is why we design a separate system that will
have a subject oriented OLAP system...
Moreover if a complex querry is fired on a OLTP system will cause a heavy
overhead on the OLTP server that will affect the daytoday business directly.
The loading of a warehouse will likely consume a lot of machine resources.
Additionally, users may create querries or reports that are very resource
intensive because of the potentially large amount of data available. Such loads
and resource needs will conflict with the needs of the OLTP systems for
resources and will negatively impact those production systems.
What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is
grouped to certain levels of dimensions.Retrieving the required data from the
actual table, which have millions of records will take more time and also
affects the server performance.To avoid this we can aggregate the table to
certain required level and can use it.This tables reduces the load in the
database server and increases the performance of the query and can retrieve the
result very fastly.
What is Dimensional Modelling? Why is it important ?
Dimensional Modelling is a design concept used by many data warehouse
desginers to build thier datawarehouse. In this design model all the data is
stored in two types of tables - Facts table and Dimension table. Fact table
contains the facts/measurements of the business and the dimension table contains
the context of measuremnets ie, the dimensions on which the facts are
Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of
the development process. Why bother especially if you are pressed for time? A
common response by practitioners who write on the subject is that you should no
more build a database without a model than you should build a house without
The goal of the data model is to make sure that the all data objects required by
the database are completely and accurately represented. Because the data model
uses easily understood notations and natural language , it can be reviewed and
verified as correct by the end-users.
The data model is also detailed enough to be used by the database developers to
use as a "blueprint" for building the physical database. The information
contained in the data model will be used to define the relational tables,
primary and foreign keys, stored procedures, and triggers. A poorly designed
database will require more time in the long-term. Without careful planning you
may create a database that omits data required to create critical reports,
produces results that are incorrect or inconsistent, and is unable to
accommodate changes in the user's requirements.
What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse.
For example an insurance dataware house can be used to mine data for the most
high risk people to insure in a certain geographial area.
What is ETL?
ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target
mappings, ransformation and job control parameter.
Take data from an external source and move it to the warehouse pre-processor
Transform data task allows point-to-point generating, modifying and transforming
Load data task adds records to a database table in a warehouse.
What does level of Granularity of a fact table signify?
The first step in designing a fact table is to determine the granularity of the
fact table. By granularity, we mean the lowest level of information that will be
stored in the fact table. This constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be
The determining factors usually goes back to the requirements
What is the Difference between OLTP and OLAP?
Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients
and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers,
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for
summarization and aggregation, stores information at different levels of
granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary
process of an organization; integrates information from many organizational
locations and data stores
What is SCD1 , SCD2 , SCD3?
SCD Stands for Slowly changing dimensions.
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
or combination of these
SCD3: by adding new columns to target table we maintain historical information
and current information.
Why are OLTP database designs not generally a good idea for a Data
Since in OLTP,tables are normalised and hence query response will be slow for
end user and OLTP doesnot contain years of data and hence cannot be analysed.
What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and
standardized definition if facts.
What are the various Reporting tools in the Market?
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
5. MS reporting services
6. Informatica Power Analyzer
8. Hyperion (BRIO)
9. Oracle Express OLAP
What is Normalization, First Normal Form, Second Normal Form , Third
1.Normalization is process for assigning attributes to entities–Reducesdata
redundancies–Helps eliminate data anomalies–Produces controlledredundancies to
2.Normalization is the analysis offunctional dependency between attributes /
data items of userviews?It reduces a complex user view to a set of small
andstable subgroups of fields / relations
1NF:Repeating groups must beeliminated, Dependencies can be identified, All key
attributesdefined,No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute
dependent on a portionof primary key, Still possible to exhibit
transitivedependency,Attributes may be functionally dependent on non-keyattributes
3NF: The Table is already in 2NF, Contains no transitivedependencies
What is Fact table?
Fact Table contains the measurements or metrics or facts of business process.
If your business process is "Sales" , then a measurement of this business
process such as "monthly sales number" is captured in the Fact table. Fact table
also contains the foriegn keys for the dimension tables.
What are conformed dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to
which they are joined Ex:Date Dimensions is connected all facts like Sales
Conformed dimentions are dimensions which are common to the cubes.(cubes are the
schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the
Facts and Dimensions here D1,D2 are the Conformed Dimensions
What are the Different methods of loading Dimension tables?
Before loading the data, all the Table constraints will be checked against the
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the
data will be checked against the table constraints and the bad data won't be
What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple
Data Marts in combination with multiple facts tables accordingly
What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their
Data Marts are subset of the corporate-wide data that is of value to a specific
group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external
providers or from data generated locally within a particular department or
2.Dependent data mart – sources directly form enterprise data warehouses.
What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table
in a data warehouse. For example: Based on design you can decide to put the
sales data in each transaction. Now, level of granularity would mean what detail
are you willing to put for each transactional fact. Product sales with respect
to each minute or you want to aggregate it upto minute and put that data.
How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF.
In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the
What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any
of the dimensions present in the fact table.
What type of Indexing mechanism do we need to use for a typical
On the fact table it is best to use bitmap indexes. Dimension tables can use
bitmap and/or the other types of clustered/non-clustered, unique/non-unique
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports
What Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one
or more additional dimensions can join. The primary dimension table is the only
table that can join to the fact table.