1. What is a Data Warehouse?
A Data Warehouse is a collection of
data marts representing historical data from different operational data source
(OLTP). The data from these OLTP are structured and optimized for querying and
data analysis in a Data Warehouse.
2. What is a Data mart?
A Data Mart is a subset of a data
warehouse that can provide data for reporting and analysis on a section, unit
or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also
called as HPQS (Higher Performance Query Structure).
3. What is OLAP?
OLAP stands for Online Analytical
Processing. It uses database tables (Fact and Dimension tables) to enable
multidimensional viewing, analysis and querying of large amount of data.
4. What is OLTP?
OLTP stands for Online Transaction
Processing Except data warehouse databases the other databases are OLTPs. These
OLTP uses normalized schema structure. These OLTP databases are designed for
recording the daily operations and transactions of a business.
5. What are Dimensions?
Dimensions are categories by which
summarized data can be viewed. For example a profit Fact table can be viewed by
a time dimension.
6. What are Confirmed Dimensions?
The Dimensions which are reusable
and fixed in nature Example customer, time, geography dimensions.
7. What are Fact Tables?
A Fact Table is a table that
contains summarized numerical (facts) and historical data. This Fact Table has
a foreign key-primary key relation with a dimension table. The Fact Table
maintains the information in 3rd normal form.
A star schema is defined is defined
as a logical database design in which there will be a centrally located fact
table which is surrounded by at least one or more dimension tables. This design
is best suited for Data Warehouse or Data Mart.
8.
What are the types of Facts?
The types of Facts are as follows.
1.
Additive Facts: A Fact which can be
summed up for any of the dimension available in the fact table.
2.
Semi-Additive Facts: A Fact which can
be summed up to a few dimensions and not for all dimensions available in the
fact table.
3.
Non-Additive Fact: A Fact which cannot
be summed up for any of the dimensions available in the fact table.
9. What are the types of Fact Tables?
The types of Fact Tables are:
1.
Cumulative Fact Table: This type of
fact tables generally describes what was happened over the period of time. They
contain additive facts.
2.
Snapshot Fact Table: This type of fact
table deals with the particular period of time. They contain non-additive and
semi-additive facts.
10. What is Grain of Fact?
The Grain of Fact is defined as the
level at which the fact information is stored in a fact table. This is also
called as Fact Granularity or Fact Event Level.
11. What is Factless Fact table?
The Fact Table which does not
contains facts is called as Fact Table. Generally when we need to combine two
data marts, then one data mart will have a fact less fact table and other one
with common fact table.
12. What are Measures?
Measures are numeric data based on
columns in a fact table.
13. What are Cubes?
Cubes are data processing units
composed of fact tables and dimensions from the data warehouse. They provided
multidimensional analysis.
14. What are Virtual Cubes?
These are combination of one or more
real cubes and require no disk space to store them. They store only definition
and not the data.
15. What is a Star schema design?
A Star schema is defined as a
logical database design in which there will be a centrally located fact table
which is surrounded by at least one or more dimension tables. This design is
best suited for Data Warehouse or Data Mart.
16. What is Snow Flake schema Design?
In a Snow Flake design the dimension
table (de-normalized table) will be further divided into one or more dimensions
(normalized tables) to organize the information in a better structural format.
To design snow flake we should first design star schema design.
17. What is Operational Data Store
[ODS] ?
It is a collection of integrated
databases designed to support operational monitoring. Unlike the OLTP
databases, the data in the ODS are integrated, subject oriented and enterprise
wide data.
18. What is Denormalization?
Denormalization means a table with
multi duplicate key. The dimension table follows Denormalization method with
the technique of surrogate key.
19. What is Surrogate Key?
A Surrogate Key is a sequence
generated key which is assigned to be a primary key in the system (table).
20. What are the client components of
Informatica 7.1.1?
Informatica 7.1.1 Client Components:
1.
Informatica Designer
2.
Informatica Work Flow Manager
3.
Informatica Work Flow Monitor
4.
Informatica Repository Manager
5.
Informatica Repository Server
Administration Console.
21. What are the server components of
Informatica 7.1.1?
Informatica 7.1.1 Server Components:
1.
Informatica Server
2.
Informatica Repository Server.
22. What is Metadata?
Data about data is called as
Metadata. The Metadata contains the definition of a data.
23. What is a Repository?
Repository is a centrally stored
container which stores the metadata, which is used by the Informatica Power
center server and Power
Center client tools. The
Informatica stores Repository in relational database format.
Informatica
7.1.1 Repository has 247 database objects
Informatica
6.1.1 Repository has 172 database objects
Informatica
5.1.1 Repository has 145 database objects
Informatica
4.1.1 Repository has 111 database objects
24. What is Data Acquisition Process?
The process of extracting the data
from different source (operational databases) systems, integrating the data and
transforming the data into a homogenous format and loading into the target
warehouse database. Simple called as ETL (Extraction, Transformation and
Loading). The Data Acquisition process designs are called in different manners
by different ETL vendors.
Informatica ---->
Mapping
Data Stage ---->
Job
Abinitio ----> Graph
25. What are the GUI based ETL tools?
The following are the GUI based ETL
tools:
1.
Informatica
2.
DataStage
3.
Data Junction
4.
Oracle Warehouse Builder
5.
Abinitio
6.
Business Object Data Integrator
7.
Cognos Decision Stream.
26. What are programmatic based ETL
tools?
1. Pl/Sql
2. SAS BASE
3. SAS ACCESS
4. Tera Data Utilities
a. BTEQ
b. Fast Load
c. Multi Load
d. Fast Export
e. T (Trickle) Pump
27.
What is a Transformation?
A transformation is a repository
object that generates, modifies, or passes data. Transformations in a mapping
represent the operations the PowerCenter Server performs on the data. Data
passes into and out of transformations through ports that you link in a mapping
or mapplet. Transformations can be active or passive. An active transformation
can change the number of rows that pass through it. A passive transformation
does not change the number of rows that pass through it.
28. The following are details
description of Transformations available in Informatica.
Transformation
|
Type
|
Description
|
Aggregator
|
Active
/ Connected
|
Performs
aggregate calculations
|
Application
Source Qualifier
|
Active
/ Connected
|
Represents
the rows that the Power Center Server reads from an application, such as an
ERP source, when it runs a session.
|
Custom
|
Active
or Passive / Connected
|
Calls
a procedure in a shared library or DLL.
|
Expression
|
Passive
/ Connected
|
Calculates
a value
|
External
Procedure
|
Active
/ Connected or Unconnected
|
Calls
a procedure in a shared library or in the COM layer of windows.
|
Filter
|
Active
/ Connected
|
Filters
data
|
Input
|
Passive
/ Connected
|
Defines
mapplet input rows. Available in the Mapplet Designer
|
Joiner
|
Active
/ Connected
|
Joins
data from different databases of flat file systems.
|
Lookup
|
Passive
/ Connected or Unconnected
|
Looks
up values
|
Normalizer
|
Active
/ Connected
|
Source
qualifier for COBOL sources. Can also use in the pipeline to normalize data
from relational or flat file sources.
|
Output
|
Passive
/ Connected
|
Defines
mapplet output rows. Available in the Mapplet Designer.
|
Rank
|
Active
/ Connected
|
Limits
records to a top or bottom range.
|
Router
|
Active
/ Connected
|
Router
data into multiple transformations based on group conditions.
|
Sequence
Generator
|
Passive
/ Connected
|
Generates
primary keys.
|
Sorter
|
Active
/ Connected
|
Sorts
data base4d on a sort key.
|
Source
Qualifier
|
Active
/ Connected
|
Represents
the rows that the PowerCenter Server reads from a relational or flat file
source when it runs a session.
|
Stored
Procedure
|
Passive
/ Connected or Unconnected
|
Calls
a stored procedure.
|
Transaction
Control
|
Active
/ Connected
|
Defines
commit and rollback transactions.
|
Union
|
Active
/ Connected
|
Merges
data from different databases or flat file systems.
|
Update
Strategy
|
Active
/ Connected
|
Determines
whether to insert, delete, update, or reject rows.
|
XML
Generator
|
Active
/ Connected
|
Reads
data from one or more input ports and outputs XML through a single output
port.
|
XML
Parser
|
Active
/ Connected
|
Reads
XML from one input port and outputs data to one or more output ports.
|
XML
Source Qualifier
|
Active
/ Connected
|
Represents
the rows that the PowerCenter Server reads from an XML source when it runs a
session.
|
29. What are features of Informatica
Repository Server?
Features of Informatica Repository
Server.
1.
Informatica client application and
Informatica server access the repository database tables through the Repository
Server.
2.
Informatica client connects to the
repository server through the host name/ IP address and its port number.
3.
The Repository Server can manager
multiple repository on different machines on the network.
4.
For each repository database registered
with the Repository Server it configures and manages a Repository Agent
process.
5.
The Repository Agent is a
multi-threaded process that performs the action needed to retrieve, insert and
updated metadata in the repository database tables.
30. What is a Work Flow?
A Work Flow is a set of instructions
on how to execute tasks such as sessions, emails and shell commands. A WorkFlow
is created from Workflow Manager.
31. What is the uses of Lookup
Transformation?
The Lookup Transformation is
useful for:
1.
Getting a related value form a table
using a key column value
2.
Update slowly changing dimension table
3.
To check whether records already exists
in the table.
.
32. What are the different sources of
Source systems of Data Warehouse?
- RDBMS
- Flat Files
- XML Files
- SAP R/3
- PeopleSoft
- SAP BW
- Web Methods
- Web Services
- Seibel
- Cobol Files
- Legacy Systems.
33. Types of Slowly Changing
Dimensions:
- Type – 1 (Recent updates)
- Type – 11 (Full historical
information)
- Type – 111 (Partial historical
information)
34. What are Update Strategy’s target
table options?
- Update as Update: Updates each row
flagged for update if it exists in the table.
- Update as Insert: Inserts a new
row for each update.
- Update else Insert: Updates if row
exists, else inserts.
35. What does a Mapping document
contains?
The Mapping document contains the following information :
- Source Definition – from where the
database has to be loaded
- Target Definition – to where the
database has to be loaded
- Business Logic – what logic has to
be implemented in staging area.
36. What does the Top Down Approach
says?
The Top Down Approach is coined by Bill Immon. According to
his approach he says “First we need to implement the Enterprise
data warehouse by extracting the data from individual departments and from the Enterprise data warehouse
develop subject oriented databases called as “Data Marts”.
37. What does the Bottom Up Approach or
Ralph Kimball Approach says?
The Bottom Down Approach is coined by Ralph Kimball.
According to his approach he says “First we need to develop subject oriented
database called as “Data Marts” then integrate all the Data Marts to develop
the Enterprise
data warehouse.
38. Who is the first person in the
organization to start the Data Warehouse project?
The first person to start the Data Warehouse project in a
organization is Business Analyst.
39. What is a Dimension Modeling?
A Dimensional Modeling is a high level methodology used to
implement the start schema structure which is done by the Data Modeler.
40. What are the types of OLAPs ?
- DOLAP: The OLAP tool which words
with desktop databases are called as DOLAP. Example: Cognos EP 7 Series
and Business Objects, Micro strategy.
- ROLAP: The OLAP which works with
Relational databases are called as ROLAP. Example: Business Object, Micro
strategy, Cognos ReportNet and BRIO.
- MOLAP: The OLAP which is responsible
for creating multidimensional structures called cubes are called as MOLAP.
Example: Cognos ReportNet.
- HOLAP: The OLAP which uses the
combined features of ROLAP and MOLAP are called as HOLAP. Example Cognos
ReportNet.
41. What is worklet?
The worklet is a group of sessions. To execute the worklet
we have to create the workflow.
42. Why we use lookup transformation?
Look up Transformations can access data from relational
tables that are not sources in mapping. With Lookup transformation, we can
accomplish the following tasks.
43. What is a Power Center
Repository?
The Power Center Repository allows you to share metadata
across repositories to create a data mart domain. In a data mart domain, you
can create a single global repository to store metadata used across an
enterprise and a number of local repositories to share the global metadata as
needed.