Home Blog Page 8

MSBI Training Part 8

18

Friends,

This is the eighth video from the series of videos. In this video we are gonna discuss about the following topics –

  • Foreign Key
  • Group By
  • Having
  • Views

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 7

2

Friends,

This is the seventh video from the series of videos. In this video we are gonna discuss about the following topics –

  • Primary Key
  • Unique
  • Not Null
  • Default
  • Check
  • Alter Table

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 6

31

Friends,

This is the sixth video from the series of videos. In this video we are gonna discuss about the following topics –

  • Joins
  • Unions
  • Constraints

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 5

82

Friends,

This is the fifth video from the series of videos. In this video we are gonna discuss about the following topics –

  • Top
  • Wild Characters
  • IN Operator
  • Between
  • Aliases

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 4

19

Friends,

This is the fourth video from the series of videos. In this video we are gonna discuss about the following topics –

  • Distinct
  • Filtering Columns
  • Filtering Rows
  • Order By
  • Delete
  • Truncate
  • Drop

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 3

137

Friends,

This is the third video from the series of videos. In this video we are gonna discuss about the following topics –

  • What is Database
  • Creating Database
  • Creating tables
  • Insert
  • Select
  • Update
  • Taking backup of tables

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 2

21

Friends,

This is the second video of the series of videos. In this video we are gonna discuss about the following topics –

  • SQL Server Services and Properties
  • Installing SQL Server with MSBI

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

MSBI Training Part 1

277

Friends,

In this series of videos we are gonna discuss about complete MSBI concepts. We will cover the following –

  • SQL Server Basics
  • SSAS
  • MDX
  • SSRS
  • SQL Server Advanced
  • SSIS

In this video we are gonna discuss about the following topics –

  • What is Data Warehousing
  • What is BI
  • Different tools in MSBI
  • Where MSBI fits

 

Hope you understood the concepts and start browsing the next one .. 😉

Regards,
Roopesh Babu V

How to restart the Package from where it failed?

36

Friends,

In DTS there was NO option like restarting the package from where it failed. In 2005 and later versions this was added by Microsoft which is a real boon to the guys working in SSIS. If the package is failed and not able to RERUN, then this is damn painful if some of the processes inside the package are expensive in terms of resources or time which needs to be started from scratch. This is the reason why we say this option is a real BOON. For this Microsoft introduced “Package Restartability” or checkpoints in SQL Server 2005 and continuing the  same in later versions too.

Checkpoints are the base for restarting the packages from where it failed. These stored the STATE of each task in the package to a file and later when we restart the package it uses the same information from the file to make sure which task needs to be ruin and which need not based on the status written to the file.

To create the “CheckPointFile” the following properties need to be set.

  • Three Package level properties
  • One Task level property for each task.

PFB all the properties in detail.

  • CheckpointFilename: This is the file name of the checkpoint file, which must be provided. There are no specific conventions or requirements for the file name.
  • CheckpointUsage: There are three values, which describe how a checkpoint file is used during package execution:
    • Never: The package will not use a checkpoint file and therefore will never restart.
    • If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
    • Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.
  • SaveCheckpoints: This is a simple Boolean to indicate whether checkpoints are to be written. Obviously this must be set to true for this scenario.
  • FailPackageOnFailure: This is the TASK level property and this has to be set to TRUE in order to create CheckPointFile else you cannot restart the package from where it failed. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.

One important point to remember here is “CheckPoints” can only be set at Control Flow but not at Data Flow level. The Data Flow Task can be a checkpoint, but it is treated as any other task. You cannot set CheckPoints for the sub tasks in the Data Flow task.

Also remember that If nothing fails in your package then NO FILE will be created. Lets see a simple example here.

In this example I am taking a simple package with THREE EXECUTE SQL Tasks and intentionally writing some code to the third so that it fails.

  • Taken a  new package.
  • Dropped three EXECUTE SQL tasks into the package control flow.
  • Edit the EXECUTE SQL tasks and set the connection property and set SQLStatement as “Select 1” for first TWO and for the third on set “Select 1/0” as shown below.

 

  • Let’s run the package to see the output of the  same. As expected it failed at THIRD task. Now if you rerun the package then it will start from FIRST task which is not the one we a re looking for.

  • Now to make the package run from the task where it failed we have to set the properties discussed  above. First let me set Package level properties.

  • I also set the task level property i.e FailPackageOnFailure to all tasks to TRUE. Now i executed the package with the same error and it failed at TASK 3 as expected.

  • Now the checkpoint file will be saved for all the tasks. I edited the Third task and modified the SQL  Statement to “Select 1” from “Select 1/0” and re triggered the package. You can see the result below.

  • From the above pic you can understand that the package restarted from the task failed. SSIS engine first checks for CheckPoint file and based on the STATE of each task i it, it starts the package from the failed task.

Super cool option right. That’s it for now. Try it and implement it. This is very very useful option in SSIS.

Happy Coding !!

Regards,
Roopesh Babu V

SSRS Interview Questions and Answers

255

Friends,

Let’s spend some time here before attending interview on SSRS. This will surely helps you. Questions will keep on adding and hence try to visit this page when ever you have to give an interview on SSRS.

• What is Query parameter in SSRS?
Query parameters is mentioned in the query of the datasources that are to be included into the SQL script’s WHERE clause of the SQL that can accept parameters. Query parameters begin with the symbol @.The name should not contain spaces and can not begin with numeral. For clarity, we use only letters.

• What are the Reporting Service Components in SSRS?
Report Designer: A place where we can create report. Report Server: Provides services for implementation and delivery of reports. Report Manager: A Web-based administration tool for managing the Report Server.

• What is a matrix in SSRS?
A matrix is a data region linked to a report set. Matrix allows us to create crosstab reports with the report variables displaying on rows and columns. It allows us to drag and drop fields into it.

• What are sub reports and how to create them?
A sub report is like any other reports which can be called in main report and can be generate
through main report. Parameters can be passed from main report to sub report and basis of
that report can be generated.

• What is the report model project?
Report model project is for creating Adhoc reporting. You can create the adhoc reports
through report builder. Report model project can be created on bids or report server. This
model can have simple view. And using

• What is report server project?
Report Server Project contains the RDL file and it need to be deployed on report server to
view the report files to application and user.
It a solution where we design our reports. You can add it by going into BIDS clicking on
new item and then selecting reports server project. Once the solution is created you can start
creating reports.

• What is the report builder?
Report builder is used to create small reports and it a define interface. You can’t change the
report interface in report builder it pre designed. You can just drag columns in the report.
Report builder creates reports on database objects available with report model project.

• In which SQL Server version report builder introduced?
Report builder introduced in SQL Server 2005. While creating or deploying report model
project on report server you can get error or it might not get created. For this you need to
check whether the service pack 22 is installed or not.

• How to deploy the Report?
Report can be deployed in three ways.
1. Using visual studio: In visual studio you can directly deploy the report through
solution explorer by providing the report server URL in project properties at Target
Server URL. This will deploy entire project or single report as per you selection.
2. Using report server: Can directly go to the report server and deploy the report by
browsing the report from the disk location on server.
3. Creating the utility: SQL server provides the utility using that which can be used to
create a customize utility for your report deployment in bulk.

• What is RS.exe utility?
Rs.exe utility is used for deploying the report on report server. It comes with the report
server and can be customize accordingly.

• What is the name of reporting services config file and what’s it’s used for?
Reporting service config file is used for report configuration details. It contains the report
format and also the report import types. Report service config reside at ISS.

• What are the three different part of RDL file explain them?
In visual studio RDL files has three parts.
1. Data: It contains the dataset on which we write the query. Data set is connected with
data source.
2. Design: In design you can design report. Can create tables and matrix reports. Drag
columns values from source.
3. Preview: to check the preview after the report run.

• Which language rdl files made of?
RDL files are written in XML.

• What is the chart in report?
Chart reports are for graphical representation. You can get pie charts columns harts and
various other options.
3d charts are also available in reporting services.

• What is Data Set in report?
Data set are the set of data which we want to show in report. Data creates on data source.
Data source is the source of data from where we are getting this data i.e. database server and
database name connection string.

• What are the different types of data sources in SSRS?
SSRS use different data source. Some of them are listed below.
1. Microsoft SQL Server
2. OLEDB
3. Oracle
4. ODBC
5. SQL Server Analysis Service
6. Report Server Model
7. SAP Net weaver BI
8. Hyperion
9. Teradata
10. XML

• What is the web service used for reporting services?
Reporting Service Web Service used in SSRS. By accessing this web service you can access all
report server component and also get the report deployed on report server.

• How to add the custom code in Report?
To add the custom codes in report go to report tab on top then properties and there you will
find the options for custom code.

• What is a cache in SSRS?
Report server can lay up a copy of processed report in a memory and return the copy when a user opens the report. This server memory is known as cache and the process is called caching.

• What is report snapshot in SSRS?
Report snapshot is a report which contains layout information and a dataset that is extracted at a particular point of time. When the new snapshot is created the previous report snapshot will be overwritten.

• What is bookmark link in SSRS?
Bookmark is a link which a person clicks to shift to a different area or page in a report. We can insert bookmarks links only to textboxes and images.

• What is Command parameter in SSRS?
A Command parameter is used to forward protocol to the Report Server regarding the item that has been retrieved. Command parameter is also used for rearranging a user’s session information.

• What is Format parameter in SSRS?
Format parameter is used to control report output. Every delivering format on Report Server has to pass through this parameter.

• What is Snapshot parameter in SSRS?
When a report is saved in a snapshot history, it is allocated a time or date to uniquely identify that report. Snapshot parameter is used to get back these historical reports by passing this time or date to get proper report.

• What are the rendering extensions of SSRS?
Rendering extensions manage the category of document produced when a report is processed. Rendering Extensions are: HTML, MHTML, EXCEL, CSV, IMAGE, PDF, and XML.

• What is a ReportItem in SSRS?
A ReportItem is one of the subsequent types of items: Rectangle, Textbox, Image, Subreport, Line, CustomReportItem, List, Table, Matrix or Chart.

• What is Datasets in SSRS?
Dataset elements have information about the sets of data recovered as a part of the report.

• What are the main components of reporting services?
The main components of Reporting Services are

  1. Report designer
  2. Report server
  3. Report manager
  4. Report user.

• What is Report Designer?
Report Designer is a collection of graphical query and design tools that are hosted within the Microsoft Visual Studio environment.

Report Designer provides a Report Data pane to organize data used in your report, and tabbed views for Design and Preview so that you can design a report interactively.
Report Designer also provides query designers to help specify data to retrieve from data sources and the Expression dialog to specify report data to use in the report layout.
When the report is complete, you can use Report Designer to preview the report and publish it directly to the report server.

• What is Report manager?
Report Manager is a Web-based report access and management tool that you use to administer a single report server instance from a remote location over an HTTP connection. You can also use Report Manager for its report viewer and navigation features.
You can use Report Manager to browse the report server folders or search for specific reports. You can view a report, its general properties, and past copies of the report that are captured in report history.

• What is Report Builder?
Web-based tool to create reports
Come free with SQL Server 2005
Launch Report Builder 1.0 to create ad hoc reports that you can save and run on the report server.
Users build desired reports themselves; can export to Word, Excel, PDF, and more

• What is Report Server?
The report server is the central component of a Reporting Services installation. It consists of a pair of core processors plus a collection of special-purpose extensions that handle authentication, data processing, rendering, and delivery operations.

SSIS Interview Questions and Answers

1129

Friends,

In this post we are gonna discuss about the different interview questions you may face in interviews. I continue to add questions and answers to this post and hence please visit this page regularly to know more and more questions and answers.

• What do we mean by dataflow in SSIS?
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse.

• What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.

• Can you name 5 or more of the native SSIS connection managers?

1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e.,
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel

• What is the use of Bulk Insert Task in SSIS?
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database.

• What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions. For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.

• How do you eliminate quotes from being uploaded from a flat file to SQL Server? 
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.

• Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

• What are the different values you can set for CheckpointUsage property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.

• What is the ONLY Property you need to set on TASKS in order to configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.

• Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.

• Can you explain different options for dynamic configurations in SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables

• What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.

• What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.

• What is Data Viewer and what are the different types of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:

1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.

• What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.

• Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.

• What are containers? What are the different types of containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.

• What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations

• What are the different types of data sources available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component

• What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.

• What is Control Flow tab?
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project.

• What is Data Flow tab?
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations.

• What is the function of control flow tab in SSIS?
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.

• What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..

• What is the function of Package explorer tab in SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.

• What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.

• How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.

• How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.

• Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
In Advanced tab, you can do some optimization here, such as setting up Key Scale option (low, medium, high), Count Distinct scale option (low, medium, high), Auto Extend factor and Warn On Division By Zero. If you check Warn On Division By Zero, the component will give warning instead of error. Key Scale option will optimize transformation cache to certain number of key threshold. If you set it low, optimization will target to 500,000 keys written to cache, medium can handle up to 5 million keys, and high can handle up to 25 million keys, or you can specify particular number of keys here. Default value is unspecified. Similar to number of keys for Count Distinct scale option. It is used to optimize number of distinct value written to memory, default value is unspecified. Auto Extend Factor is used when you want some portion of memory is used for this component. Default value is 25% of memory.

• Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.

• Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0x1234 to 0x4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing

• Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.

• Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.

• Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convery the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.

• Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.

• Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.

• Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.

• Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.

• Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.

• Explain Percentage and row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.

• Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.

• Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.

• What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System

• What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.

• What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.

• What is the diff between control flow Items and data flow Items?
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.

• What are the main component of SSIS(project-architecture)?
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients

• Different components in SSIS package?
Control flow
Data flow
Event handler
Package explorer

• What are Connection Managers?
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.

• What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.

• How to provide securIty to packages?
We can provide security in two ways
1. Package encryption
2. Password protection.

• What are Precedence constraints?
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.

• What is Design time Deployment in SSIS ?

When you run a package from with in BIDS,it is built and temporarily deployed to the folder. By default the package will be deployed to the BIN folder in the Package’s Project folder and you can configure for custom folder for deployment. When the Package’s execution is completed and stopped in BIDS,the deployed package will be deleted and this is called as Design Time Deployment.

SSAS Interview Questions and Answers

59
  •  What is the difference between SSAS 2005 and SSAS2008?
  1. In 2005 its not possible to create an empty cube but in 2008 we can create an empty cube.
  2. A new feature in Analysis Services 2008 is the Attribute Relationships tab in the Dimension Designer . to implement attribute relationship is complex in ssas 2005
  3. we can create ONLY 2000 partitions per Measure Group in ssas 2005 and the same limit of partitions is removed in ssas 2008.
You can answer more but if you end this with these then the interviewer feel that you are REAL EXPERIENCED.
  •  What is datawarehouse in short DWH?

The datawarehouse is an informational environment that

  • Provides an integrated and total view of the enterprise
  • Makes the enterprise’s current and historical information easily available for decision making
  • Makes decision-support transactions possible without hindering operational systems
  • Renders the organization’s information consistent
  • Presents a flexible and interactive source of strategic information

OR a warehouse is a

  • Subject oriented
  • Integrated
  • Time variant
  • Non volatile for doing decision support

OR

Collection of data in support of management’s decision making process”. He defined the terms in the sentence as follows.

OR

Subject oriented:

It define the specific business domain ex: banking, retail, insurance, etc…..

Integrated:

It should be in a position to integrated data from various source systems

Ex: sql,oracle,db2 etc……

Time variant:

It should be in a position to maintain the data the various time periods.

Non volatile:

Once data is inserted it can’t be changed

  • What is data mart?

A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.

Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

They are 3 types of data mart they are

  1. Dependent
  2. Independent
  3. Logical data mart
  •  What are the difference between data mart and data warehouse?

Datawarehouse is complete data where as Data mart is Subset of the same.

Ex:

All the organisation data may related to finance department, HR, banking dept are stored in data warehouse where as in data mart only finance data or HR department data will be stored. So data warehouse is a collection of different data marts.

  •  Have you ever worked on performance tuning, if yes what are the steps involved in it?

We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to following the following.

  1. Avoid named queries
  2. Unnecessary relationships between tables
  3. Proper attribute relationships to be given
  4. Proper aggregation design
  5. Proper partitioning of data
  6. Proper dimension usage design
  7. Avoid unnecessary many to many relationships
  8. Avoid unnecessary measures
  9. Set AttributeHierarchyEnabled = FALSE to Attributes that is not required
  10. Won’t take even single measure which is not necessary.
  • What are the difficulties faced in cube development?

This question is either to test whether you are really experienced or when he doesnot have any questions to ask .. 😉

You can tell any area where you feel difficult to work. But always the best answers will be the following.

  1. Giving attribute relationships
  2. Calculations
  3. Giving dimension usage (many to many relationship)
  4. Analyzing the requirements
  •  Explain the flow of creating a cube?

Steps to create a cube in ssas

  1. Create  a data source.
  2. Create a datasource view.
  3. Create Dimensions
  4. Create a cube.
  5. Deploy and Process the cube.
  • What is a datasource or DS?

The data source is the Physical Connection information that analysis service uses to connect to the database that host the data. The data source contains the connection string which specifies the server and the database hosting the data as well as any necessary authentication credentials.

  • What is datasourceview or DSV?

A data source view is a persistent set of tables from a data source that supply the data for a particular cube. BIDS also includes a wizard for creating data source views, which you can invoke by right-clicking on the Data Source Views folder in Solution Explorer.

  1. Datasource view is the logical view of the data in the data source.
  2. Data source view  is the only thing a cube can see.
  •  What is named calculation?

A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source.

Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both.

  • What is named query?

Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table.

  • Why we need named queries?

A named query is used to join multiple tables, to remove unnecessary columns from a table of a database. You can achieve the same in database using Views but this Named Queries will be the best bet whe you don’t have access to create Views in database.

  •  How will you add a new column to an existing table in data source view?

By using named calculations we can add a new column to an existing table in the data source view. Named Calculation is explained above.

  •  What is dimension table?

A dimension table contains hierarchical data by which you’d like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an “attribute”

The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key column”

The primary key column of each dimension table corresponding to the one of the key column  in any related fact table.

  • What is fact table?

A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as “The table which contains METRICS” that are used to analyse the business.

It consists of 2 sections

1) Foregine key to the dimesion

2) measures/facts(a numerical value that used to monitor business activity)

  •  What is Factless fact table?

This is very important interview question. The “Factless Fact Table” is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.

Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected

  •  What is attribute relationships, why we need it?

Attribute relationships are the way of telling the analysis service engine that how the attributes are related with each other. It will help to relate two or more  attributes to each other.Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too.

In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows:

  • Between the key attribute and each non-key attribute bound to columns in the main dimension table.
  • Between the key attribute and the attribute bound to the foreign key in the secondary table that links the underlying dimension tables.
  • Between the attribute bound to foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.

 

  • How many types of attribute relationships are there?

They are 2 types of attribute relationships they are

  1. Rigid
  2. Flexible

Rigid: In Rigid relationships  where the relationship between the attributes is fixed, attributes will not change levels or their respective attribute relationships.

Example: The time dimension. We know that month “January 2009” will ONLY belong to Year “2009” and it wont be moved to any other year.

Flexible :   In Flexible relationship between the attributes is changed.

Example: An employee and department. An employee can be in accounts department today but it is possible that the employee will be in Marketing department tomorrow.

  •  How many types of dimensions are there and what are they?

They are 3 types of dimensions:

  1. confirm dimension
  2. junk dimension
  3. degenerate attribute
  • What are confirmed dimensions, junk dimension and degenerated dimensions?

Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions.

junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension.

Example: In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.

  • What are the types of database schema?

They are 3 types of database schema they are

  1. Star
  2. Snowflake
  3. Starflake
  •  What is star, snowflake and star flake schema?

Star schema: In star schema fact table will be directly linked with all dimension tables. The star schema’s dimensions are denormalized with each dimension being represented by a single table. In a star schema a central fact table connects a number of individual dimension tables.

Snowflake: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. In snow flake schema fact table will be linked directly as well as there will be some intermediate dimension tables between fact and dimension tables.

Star flake: A hybrid structure that contains a mixture of star(denormalized) and snowflake(normalized) schema’s.

  • How will you hide an attribute?

We can hide the attribute by selecting “AttributeHierarchyVisible = False” in properties of the attribute.

  •  How will you make an attribute not process?

By selecting  “ AttributeHierarchyEnabled = False”, we can make an  attribute not in process.

  •  What is use of IsAggregatable property?

In Analysis Service we generally see all dimension has All member. This is because of IsAggregatable property of the attribute. You can set its value to false, so that it will not show All member. Its default member for that attribute. If you hide this member than you will have to set other attribute value to default member else it will pick some value as default and this will create confusion in browsing data if someone is not known to change in default member.

  •  What are key, name and value columns of an attribute?

Key column of any attribute: Contains the column or columns that represent the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. The value of this column for each member is displayed to users unless a value is specified for the NameColumn property.

Name  column of an attribute: Identifies the column that provides the name of the attribute that is displayed to users, instead of the value in the key column for the attribute. This column is used when the key column value for an attribute member is cryptic or not otherwise useful to the user, or when the key column is based on a composite key. The NameColumn property is not used in parent-child hierarchies; instead, the NameColumn property for child members is used as the member names in a parent-child hierarchy.

Value columns of an attribute: Identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element.

  • What is hierarchy, what are its types and difference between them?

A hierarchy is a very important part of any OLAP engine and allows users to drill down from  summary levels hierarchies represent the way user expect to explore data at more detailed level

hierarchies  is made up of multipule levels creating the structure based on end user requirements.

->years->quarter->month->week ,are all the levels of calender hierarchy

They are 2 types of hierarchies they are

  1. Natural hierarchy
  2. Unnatural hierarchy

 Natural hierarchy: This means that the attributes are intuitively related to one another. There is a clear relationship from the top of the hierarchy to the bottom.

Example: An example of this would be date: year, quarter and month follow from each other, and in part, define each other.

Unnatural hierarchy: This means that the attributes are not clearly related.

Example: An example of this might be geography; we may have country -> state -> city, but it is not clear where Province might sit.

  •  What is Attribute hierarchy?

An attribute hierarchy is created for every attribute in a dimension, and each hierarchy is available for dimensioning fact data. This hierarchy consists of an “All” level and a detail level containing all members of the hierarchy.

you can organize attributes into user-defined hierarchies to provide navigation paths in a cube. Under certain circumstances, you may want to disable or hide some attributes and their hierarchies.

  •  What is use of AttributeHierarchyDisplayFolder property ?

AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated attribute hierarchy to end users. For example if I set the property value as “Test” to all the Attributes of a dimension then a folder with the name “Test” will be created and all the Attributes will be placed into the same.

  •  What is use of AttributeHierarchyEnabled?

AttributeHierarchyEnabled: Determines whether an attribute hierarchy is generated by Analysis Services for the attribute. If the attribute hierarchy is not enabled, the attribute cannot be used in a user-defined hierarchy and the attribute hierarchy cannot be referenced in Multidimensional Expressions (MDX) statements.

  •  What is use of AttributeHierarchyOptimizedState?

AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute.

  •   What is use of AttributeHierarchyOrdered ?

AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.

  •  What is the use of AttributeHierarchyVisible ?

AttributeHierarchyVisible : Determines whether the attribute hierarchy is visible to client applications. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.

  •  What are types of storage modes?

There are three standard storage modes in OLAP applications

  1. MOLAP
  2. ROLAP
  3. HOLAP
  • Compare the Three Storage Modes ?

Summary and comparison

Basic Storage Mode Storage Location for Detail Data Storage Location for Summary/ Aggregations Storage space requirement Query Response Time Processing Time Latency
MOLAP Multidimensional Format Multidimensional Format MediumBecause detail data is stored in compressed format. Fast Fast High
HOLAP Relational Database Multidimensional Format Small Medium Fast Medium
ROLAP Relational Database Relational Database Large Slow Slow Low

 

  • What is MOLAP and its advantage?

MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database. The data inside the cube will refresh only when the cube is processed, so latency is high.

Advantages:

  1. Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
  2. The data is compressed so it takes up less space.
  3. And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
  4. Cube browsing is fastest using MOLAP.
  •  What is ROLAP and its advantage?

ROLAP (Relational Online Analytical Processing) : ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.

Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.

Advantages:

  1. Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
  2. Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
  3. Low latency.
  •   What is HOLAP and its advantage?

Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.

Advantages:

  1. HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
  2. Cubes are smaller than MOLAP since the detail data is kept in the relational database.
  3. Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
  4. Low latency since processing takes place when changes occur and detail data is kept in the relational database.
  • What are Translations and its use?

Translation: The translation feature in analysis service allows you to display caption and attributes names that correspond to a specific language. It helps in providing GLOBALIZATION to the Cube.

  • What is Database dimension?

All the dimensions that are created using NEW DIMENSION Wizard are database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions.

  • What is Cube dimension?

A cube dimension is an instance of a database dimension within a cube is called as cube dimension. A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension

  • Difference between Database dimension and Cube dimension?
  1. The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
  2. Database dimension is created one where as Cube dimension is referenced from database dimension.
  3. Database dimension exists only once.where as Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept.
  •  How will you add a dimension to cube?

To add a dimension to a cube follow these steps.

  1.   In Solution Explorer, right-click the cube, and then click View Designer.
  1.   In the Design tab for the cube, click the Dimension Usage tab.
  2.   Either click the Add Cube Dimension button, or right-click anywhere on the work surface  and then click Add Cube Dimension.
  3.   In the Add Cube Dimension dialog box, use one of the following steps:
  4. To add an existing dimension, select the dimension, and then click OK.
  5. To create a new dimension to add to the cube, click New dimension, and then follow the steps in the Dimension Wizard.
  • What is SCD (slowly changing dimension)?

Slowly changing dimensions (SCD) determine how the historical changes in the dimension tables are handled. Implementing the SCD mechanism enables users to know to which category an item belonged to in any given date.

  • What are types of SCD?

It is a concept of STORING Historical Changes and when ever an IT guy finds a new way to store then a new Type will come into picture. Basically there are 3 types of SCD they are given below

  1. SCD type1
  2. SCD type2
  3. SCD type3
  • What  is Type1, Type2, Type3 of SCD?

Type 1: In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key Name State
1001 Christina California

Advantages: This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages: All history is lost. By applying this methodology, it is not possible to trace back in history.

Usage:  About 50% of the time.

When to use Type 1: Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

 Type 2: In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key Name State
1001 Christina Illinois
1005 Christina California

Advantages: This allows us to accurately keep all historical information.

Disadvantages:

  1. This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
  2. This necessarily complicates the ETL process.

Usage: About 50% of the time.

Type3 : In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:

Customer Key,Name,OriginalState,CurrentState,Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):

Customer Key Name OriginalState CurrentState Effective Date
1001 Christina Illinois California 15-JAN-2003

Advantages:

  1. This does not increase the size of the table, since new information is updated.
  2. This allows us to keep some part of history.

Disadvantages: Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

Usage: Type 3 is rarely used in actual practice.

  •  What is role playing dimension with two examples?

Role play dimensions: We already discussed about this. This is nothing but CONFIRMED Dimensions. A dimension can play different role in a fact table you can recognize a roleplay dimension when there are multiple columns in a fact table that each have foreign keys to the same dimension table.

Ex1: There are three dimension keys in the factinternalsales,factresellersales tables which all refer to the dimtime table,the same time dimension is used  to track sales by that contain either of these fact table,the corresponding  role-playing dimension are automatically added to the cube.

Ex2 : In retail banking, for checking account cube we could have transaction date dimension and effective date dimension. Both dimensions have date, month, quarter and year attributes. The formats of attributes are the same on both dimensions, for example the date attribute is in ‘dd-mm-yyyy’ format. Both dimensions have members from 1993 to 2010.

  •  What is measure group, measure?

Measure groups : These measure groups can contain different dimensions and be at different granularity  but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity.

Creating multiple measure groups : To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select ‘New Measure Group’. You’ll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that aren’t used as foreign key columns in the DSV will automatically be created as measures, and you’ll also get an extra measure of aggregation type Count. It’s a good idea to delete any measures you are not going to use at this stage.

 Measures :  Measures are the numeric values that our users want to aggregate, slice, dice and otherwise analyze, and as a result, it’s important to make sure they behave the way we want them to. One of the fundamental reasons for using Analysis Services is that, unlike a relational database it allows us to build into our cube design business rules about measures: how they should be formatted, how they should aggregate up, how they interact with specific dimensions and so on.

  •  What is attribute?

An attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such.

  •  What is surrogate key?

A surrogate key is the SQL generated key which acts like an alternate primary key for the table in database, Data warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that primarykey uniquely identifies a record while a Surrogatekey uniquely identifies an entity.

Ex: An employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data.

  • How many types of relations are there between dimension and measure group?

They are six relation between the dimension and measure group, they are

  1. No Relationship
  2. Regular
  3. Refernce
  4. Many to Many
  5. Data Mining
  6. Fact
  •  What is regular type, no relation type, fact type, referenced type, many-to-many type with example?

No relationship: The dimension and measure group are not related.

Regular: The dimension table is joined directly to the fact table.

Referenced: The dimension table is joined to an intermediate table, which in turn,is joined to the fact table.

Many to many:The dimension table is to an intermediate fact table,the intermediate fact table is joined , in turn, to an intermediate dimension table to which the fact table is joined.

Data mining:The target dimension is based on a mining model built from the source dimension. The source dimension must also be included in the cube.

Fact table: The dimension table is the fact table.

  •  What are calculated members and what is its use?

Calculations are item in the cube that are eveluated at runtime

Calculated members: You can create customized measures or dimension members, called calculated members, by combining cube data, arithmetic operators, numbers, and/or functions.

Example: You can create a calculated member called Marks that converts dollars to marks by multiplying an existing dollar measure by a conversion rate. Marks can then be displayed to end users in a separate row or column. Calculated member definitions are stored, but their values exist only in memory. In the preceding example, values in marks are displayed to end users but are not stored as cube data.

  •  What are KPIs and what is its use?

In Analysis Services, a KPI is a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success. We use KPI to see the business at the particular point, this is represents with some graphical items such as traffic signals,ganze etc

  • What are actions, how many types of actions are there, explain with example?

Actions are powerful way of extending the value of SSAS cubes for the end user. They can click on a cube or portion of a cube to start an application with the selected item as a parameter, or to retrieve information about the selected item.

One of the objects supported by a SQL Server Analysis Services cube is the action. An action is an event that a user can initiate when accessing cube data. The event can take a number of forms. For example, a user might be able to view a Reporting Services report, open a Web page, or drill through to detailed information related to the cube data

Analysis Services supports three types of actions..

Report action: Report action Returns a Reporting Services report that is associated with the cube data on which the action is based.

Drill through: Drillthrough Returns a result set that provides detailed information related to the cube data on which the action is based.

Standard: Standard has five action subtypes that are based on the specified cube data.

Dataset: Returns a mutlidimensional dataset.

Proprietary: Returns a string that can be interpreted by a client application.

Rowset: Returns a tabular rowset.

Statement: Returns a command string that can be run by a client application.

URL:  Returns a URL that can be opened by a client application, usually a browser.

  • What is partition, how will you implement it?

You can use the Partition Wizard to define partitions for a measure group in a cube. By default, a single partition is defined for each measure group in a cube. Access and processing performance, however, can degrade for large partitions. By creating multiple partitions, each containing a portion of the data for a measure group, you can improve the access and processing performance for that measure group.

  • What is the minimum and maximum number of partitions required for a measure group?

In 2005 a MAX of 2000 partitions can be created per measure group and that limit is lifted in later versions.

In any version the MINIMUM is ONE Partition per measure group.

  •  What are Aggregations and its use?

Aggregations provide performance improvements by allowing Microsoft SQL Server Analysis Services (SSAS) to retrieve pre-calculated totals directly from cube storage instead of having to recalculate data from an underlying data source for each query. To design these aggregations, you can use the Aggregation Design Wizard. This wizard guides you through the following steps:

  1. Selecting standard or custom settings for the storage and caching options of a partition, measure group, or cube.
  2. Providing estimated or actual counts for objects referenced by the partition, measure group, or cube.
  3. Specifying aggregation options and limits to optimize the storage and query performance delivered by designed aggregations.
  4. Saving and optionally processing the partition, measure group, or cube to generate the defined aggregations.
  5. After you use the Aggregation Design Wizard, you can use the Usage-Based Optimization Wizard to design aggregations based on the usage patterns of the business users and client applications that query the cube.
  •  What is perspective, have you ever created perspective?

Perspectives are a way to reduce the complexity of cubes by hidden elements like measure groups, measures, dimensions, hierarchies etc. It’s nothing but slicing of a cube, for ex we are having retail and hospital data and end user is subscribed to see only hospital data, then we can create perspective according to it.

  •  What is deploy, process and build?

Bulid: Verifies the project files and create several local files.

Deploy: Deploy the structure of the cube(Skeleton) to the server.

Process: Read the data from the source and build the dimesions and cube structures

Elaborating the same is given below.

Build: Its is a used to process the data of the cube database. Build is a version of a program. As a rule, a build is a pre-release version and as such is identified by a build number, rather than by a release number. Reiterative (repeated) builds are an important part of the development process. Throughout development, application components are collected and repeatedly compiled for testing purposes, to ensure a reliable final product. Build tools, such as make or Ant, enable developers to automate some programming tasks. As a verb, to build can mean either to write code or to put individual coded components of a program together.

Deployment: During development of an Analysis Services project in Business Intelligence Development Studio, you frequently deploy the project to a development server in order to create the Analysis Services database defined by the project. This is required to test the project.

for example, to browse cells in the cube, browse dimension members, or verify key performance indicators (KPIs) formulas.

  •  What is the maximum size of a dimension?

The maximum size of the dimension is 4 gb.

  •  What are the types of processing and explain each?

They are 6 types of processing in ssas ,they are

  • Process Full
  • Process Data
  • Process Index
  • Process Incremental
  • Process Structure
  • UnProcess

 Process Full: Processes an Analysis Services object and all the objects that it contains. When Process Full is executed against an object that has already been processed, Analysis Services drops all data in the object, and then processes the object. This kind of processing is required when a structural change has been made to an object, for example, when an attribute hierarchy is added, deleted, or renamed. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

Process Data: Processes data only without building aggregations or indexes. If there is data is in the partitions, it will be dropped before re-populating the partition with source data. This processing option is supported for dimensions, cubes, measure groups, and partitions.

Process Index: Creates or rebuilds indexes and aggregations for all processed partitions. This option causes an error on unprocessed objects. This processing option is supported for cubes, dimensions, measure groups, and partitions.

Process Increment: Adds newly available fact data and process only to the relevant partitions. This processing option is supported for measure groups, and partitions.

Process Structure: If the cube is unprocessed, Analysis Services will process, if it is necessary, all the cube’s dimensions. After that, Analysis Services will create only cube definitions. If this option is applied to a mining structure, it populates the mining structure with source data. The difference between this option and the Process Full option is that this option does not iterate the processing down to the mining models themselves. This processing option is supported for cubes and mining structures.

Unprocess : Drops the data in the object specified and any lower-level constituent objects. After the data is dropped, it is not reloaded. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

Process Default: Detects the process state of an object, and performs processing necessary to deliver unprocessed or partially processed objects to a fully processed state. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

  • What is a cube?

The basic unit of storage and analysis in Analysis Services is the cube. A cube is a collection of data that’s been aggregated to allow queries to return data quickly.

For example, a cube of order data might be aggregated by time period and by title, making the cube fast when you ask questions concerning orders by week or orders by title.

  • What is AMO?

The full form of AMO is Analysis Managament Objects. This is used to create or alter cubes from .NET code.

  • After creating the cube, if  we added a new column to the OLTP table then how you add this new attribute to the cube?

Just open the datasourceview and on right click we find the option REFRESH. Click the REFRESH then it will add new attributes to the table which can be added to Cube.

REAL TIME INTERVIEW QUESTIONS – 

  •  What is the size of the Cube in your last Project?

Answer to this question varies from project to project and mainly depends on how BIG is your database and how COMPLEX the database design is. Generally for the database with a TRANSACTION TABLE of 50 crore records, the cube size will be around 100GB. So, better go with 100GB as answer to this question.

  • What is size of the database in your last Project?
You can expect this question immediately after you answer 100GB to the last question. The database size will be 600 to 800GB for which the cube will come to 100 GB. So go with 800GB for this question.
  • What is size of the fact(Transaction) table in your last Project?
This will be the next question if you answer 800GB as your dataabase size. Here he is not expecting SIZE in GBs but the interviewer will be expecting NUMBER OF ROWS in the Transaction table. Go with 57Crore records for this question.
  • How frequently you process the cube?
You have to be very careful here. Frequency of processing cube depends on HOW FREQUENTLY YOU ARE GETTING NEW DATA. Once the new data comes then SSIS team loads it and send a mail to SSAS team after load is completed successfully. Once SSAS team receives the mail then these guys will look for best time to PROCESS.
Typically we get data either Weekly or Monthly. So you can say that the processing of the cube will be done either Weekly or monthly.
  • How frequently you get DATA from clients?
This answer should be based on your last answer. IF you answered WEEKLY to last question then the Answer to this question also should be WEEKLY. IF MONTHLY for last question then this answer also should be MONTHLY.
  • What type of Processing Options you used to process the cube in your Project?
This is the toughest question to answer. This depends on DATA you have and CLIENTS requirements. Let me explain here.
  1. If the database is SMALL, let’s say it has only 1 crore records then people do FULL PROCESS as it wont take much time.
  2. If the database is MEDIUM, let’s say it has only 15 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes little bit of time.
  3. If the database is HUGE, let’s say it has more than 35 to 40 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes lot of time. In this case we TRY to convince clients for INCREMENTAL and if they don’t agree then we don’t have any other option.
  4. Incremental process will come into picture ONLY when there is no updates to the OLD data i.e no changes to already existing data else NO OTHER OPTION than FULL PROCESS.
  • How you provide security to cube?
By defining roles we provide security to cubes. Using roles we can restrict users from accessing restricted data. Procedure as follows – 
  1. Define Role
  2. Set Permission
  3. Add appropriate Users to the role
  • How you move the cube from one server to another?
There are many ways to do the same. Let me explain four here and cleverly you can say “I worked on 4 SSAS projects till date and implemented different types in all the four.”
  1. Backup and restore – This is the simplest way. Take the Backup from development server and copy the backup to FTP folder of clients. After doing this drop a mail to Client’s Admin and he will take care of RESTORE part.
  2. Directly PROCESS the cube in PRODUCTION environment. For this you need access to Production which will not be given by clients unless the clients are *********. One of the client I worked for given FULL access to me .. 😉
  3. Under Srart –> All Programs –> Sql Server –> Analysis Services you can see deployment wizard. This is one way of moving the cube. This method has some steps to follow. First deploy your cube and FOUR files will be created in BIN folder of PROJECT folder. Copy those FOUR files and paste in Production server in any directory. Then OPEN this DEPLOYMENT Wizard in production and when it ask for Database file then point to the location where you copied the files. After that NEXT,NEXT,NEXT … OK .. Cube will be deployed and processed.
  4. This way is most beautiful one. Synchronization, In this we will first deploy and process the cube in STAGING ENVIRONMENT and then we will go to production server. Connect to Analysis services in SSMS and select Synchronize by right clicking on Databases folder in SSMS of analysis services. Then select source as STAGING SERVER and then click on OK. The changes in the cube present in the Staging server will be copied to the production server.
  • What is the toughest challenge you face in your Project?
There are couple of this where we face difficulty.
  1. While working on RELATIONSHIPS between Measure Groups and Dimensions.
  2. Working on Complex calculations
  3. Performance tuning
  • How you created Partitions of the cube in your Last Project?
Partitions can be created on different data. Few people do it on PRODUCT NAME wise and many prefer to do it on DATE data wise.  you go with DATE wise.
In dates, we can create MONTH wise,WEEK wise,QUARTER wise and some times YEAR wise. This all depends on how much data you are coming per WEEK or MONTH or QUARTER … If you are getting 50 lakhs records per month then tell you do MONTH wise.
  • How many dimensions in your last cube?
47 to 50.
  • How many measure groups in your last cube?
Total 10 and in that 4 are Fact tables and remaining 6 are Fact less fact tables.
  • What is the Schema of your last cube?
Snowflake
  • Why not STAR Schema ?

My data base design doesn’t support STAR Schema.

  • What are the different relationships that you are used in your cube?
  1. Regular
  2. Referenced
  3. Many to Many
  4. Fact
  5. No Relationship
  • Have you created the KPI’s , If then Explain?
Don’t add much to this as the questions in this will be tricky. Just tell that you worked on couple of KPI and you have basic knowledge on this. (Don’t worry, this is not MANDATORY)
  • How you define Aggregations in your Project?
We defined the aggregations for MOST FREQUENTLY USED data in SSRS reports.
  • Size of SSAS team in your last Project?

Just 2 guys as we guys are really in demand and lot of scarcity:)

  • How many Resources worked on same Cube in your Project?
Only 2 and one in morning shift and another in Evening shift.
  • How much time it take to Process the Cube?
This is Very very important question. This again depends on the SIZE of database,Complexity of the database and your server settings. For database with 50 cr transaction records, it generally takes 3.5 hrs.
  • How many Calculation you done in Your Project?
I answer more than 5000 and if you tell the same then you are caught unless you are super good in MDX. Best answer for you is “Worked on 50 calculations”.

Conditional Split Transformation in SSIS with Example

2386

Friends,

We come across many situations where we need to split data based on some conditions, like we mostly do in switch case statements or if else conditions in other programming languages and for doing the same we have Conditional Split task in SSIS. When ever you need to split the data coming from a source into many output paths based on some condition then CONDITIONAL SPLIT is the best bet.

It is used to transfer data to different output files or locations on satisfaction of the condition specified and content of the data. Here in this post lets see a simple example which explains about how to configure CONDITIONAL SPLIT transformation.

  • Start Integration Services Project template. On the package1.dtsx file add a data flow task as shown.

  • Edit the Data Flow task by selecting EDIT button on Right Click context menu. One Data Flow page is opened then drag and drop a Flat File source from toolbox to Data Flow pane.
  • Now Edit the Flat File source as shown below to set the connection properties.

  • When the popup opened, select NEW button to set the new Flat File connections to the FILE in which we have SAMPLE data. Now provide all the connection properties like path,delimiters ..

  • Select the columns page to check whether DATA is reading as expected or not.

  • If you see the Column3 in above screenshot, it is showing GENDER of Employees as either M or F. Now add a conditional split transformation to the data Flow pane and provide the output of Flat File source as input to Conditional Split transformation.
  • Now edit the Conditional Split to configure the same and write the below CONDITION under CONDITION column of the transformations and provide proper OUTPUT Name as shown below.

 

For filtering MALE Employees – UPPER([Column 3]) == “M”

For filtering FEMALE Employees -UPPER([Column 3]) == “F”

 

  • Now we need to add two Flat File Destination components and we can attach the conditional task output to them using the green arrow line. When ever you drag and drop on destination then it will ask for WHICK output we need to set as source as shown below and select the correct one for each destination.
  • I selected Males as Input to First Flat File destination and Females as input to the second destination file. (I know i have used Males and Females which is WRONG but what to do, i did this mistake and captured Screenshots .. Please adjust .. :))

 

  • Configure the Flat file destination for Males and Females data . Double click on the component, Click on new, Select the format as delimited, Click OK. Set the output file path and then click on Mapping tab and finally click OK. Repeat the same for the second file too.

  • After configuring Click OK.

 

  • Now execute the package and see the output. You can see the data moved to both the files by applying the condition given .. i.e Male Employees to ONE FILE and Female Employees to ANOTHER FILE.

This is it .. As simple as this .. You can easily split the data coming from input to many outputs based on condition. You can also use functions while providing the condition and you can find all the functions on the RIGHT TOP side on Conditional Split Transformation. Also remember that you can use VARIABLE values as part of SPLIT Condition. Hope you understood it .. Enjoy Coding !!

Regards,
Roopesh Babu V

GRID Data Viewers in SSIS with Example

327

Friends,

Data viewers are a very important feature in SSIS for debugging your data pump pipeline. They allow you to view data at points in time at runtime. If you place a data viewer before and after the Aggregate Transform, you can see the data flowing into the transform at runtime and what it looks like after the transform happens. There is one fantastic feature  in this Data Viewers i.e Once you deploy your package and run it on the server as a job or with the service, the data viewers do not show because they are only a debug feature. Anytime the package is executed outside the designer, the data viewers won’t show. There are four types of data viewers:

  • Grid
  • Histogram
  • Scatter Plot
  • Column Chart

1. Grid:

  • GRID Type Will displays the columns in the Grid Format.

2. Histogram:

  • Select Histogram when columns wants display in histogram model
  • Histogram graph works only with 1 column at a time.
  • Histogram graph works only with numeric data

3. Scatter Plot:

  • Scatter Plot (x.y) works only with numeric data.

4. Column Chart:

  • Select Column Chart when columns want to display in the column chart.
  • This graph only works with 1 column at a time.

In this post, let us see a simple example to configure GRID Data Viewers in a Data Flow task.

  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object. Make sure the Data Flow Page is opened as shown below.

  • Select an OLE DB data sources from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string to the table of your wish as shown below.

  • Select  the required columns from the COLUMNS TAB(Page) as shown below.

  • Drag and Drop the Flat File Destination, make a connection between the OLE DB Source and Flat File Destination and configure it.
  • Double click on the pipeline connecting OLE DB Source and Flat File destination to configure Data Viewers properties and click on ADD to add a new Data Viewer as shown below.

  • You can find all the FOUR different types of Data Viewers you can set and select GRID as we are covering about only GRID in this post.

  • Select the GRID tab and select the columns for which we wish to see the data in Data Viewers. By default all the columns will be selected and we can deselect any column which we don’t want to be part of Data Viewer and then click OK as shown below.

  • Now you can see an ICON showing the Data Viewer is created for a path as shown below.

  • Now the package is all set to test Data Viewers functionality and hence trigger the package. Once the execution reaches to the point of Data Viewer, the GRID will be opened and shown as below.

In the above picture, the highlighted part is the Data Viewer where you can see the columns configured as part of GRID Data Viewer configuration.

That is it .. In next post we will see about the remaining types of Data Viewers like Histogram,Scatter Plot and Column Chart. Till then enjoy Coding !!

Regards,
Roopesh Babu V

SQL Server Jobs Execution Status

160

Friends,

When ever you are working with sql server agent and dealing with lots of jobs then you need the below given query. The query given below will give the status of each and every job. This also c an be provided as VALUE ADD(Which will impress client) to client.

Generally we configure to E-Mail notification which says whether a JOB is successful or failed but the problem is “You have to check thousands of mails to get status of all jobs that run in a day”. Using the below query we can get the info about all the jobs and if you can provide a GUI for the same then it will become a VALUE ADD to client ..

SELECT [JobName] = JOB.name,
[Step] = HISTORY.step_id,
[StepName] = HISTORY.step_name,
[Message] = HISTORY.message,
[Status] = CASE WHEN HISTORY.run_status = 0 THEN ‘Failed’
WHEN HISTORY.run_status = 1 THEN ‘Succeeded’
WHEN HISTORY.run_status = 2 THEN ‘Retry’
WHEN HISTORY.run_status = 3 THEN ‘Canceled’
END,
[RunDate] = HISTORY.run_date,
[RunTime] = HISTORY.run_time,
[Duration] = HISTORY.run_duration
FROM msdb..sysjobs JOB
INNER JOIN msdb..sysjobhistory HISTORY ON HISTORY.job_id = JOB.job_id
WHERE HISTORY.run_date=convert(varchar,getdate(),112)
ORDER BY HISTORY.run_date, HISTORY.run_time

Try running this query and see the result. Remember that you need to run the query in MSDB database.

This is it .. As simple as this .. Hope this works .. 😛

Regards,
Roopesh Babu V

How to View all the rows in a Data Viewer ?

90

Friends,

Couple of my students asked me this questions. Even though the answer is simple, I am adding it here thinking the same question may come to many others who are beginners in SSIS.

In a Data Flow, SSIS pulls the data from your source into separate buffers which it manages separately down the data pipeline. In order to see the next buffer, you click the little play button in the top left hand corner which will then give you the second buffer. The number of rows displayed can be the same or different. By doing some quick math, SSIS probably generates about 2-3 buffers for you. You can easily copy and paste the data contained in each buffer by clicking the copy data button in the top right hand corner. PFB the screenshot showing the PLAY button highlighted.

 

Hope this helps .. Enjoy Coding !!

Regards,
Roopesh Babu V

MERGE Transformation in SSIS with Example

21

Friends,

In this post we are gonna discuss about MERGE transformation. MERGE in SSIS is equal to UNION ALL in SQL Server. This transformation unions two datasets/tables.The merge Transformation combines two sorted dataset into single dataset. Highlighted the text SORTED in last statement because “It is not possible to use MERGE when the inputs are NOT SORTED”. There is one more transformation which is very similar to this i.e UNION ALL. The Merge Transformation is similar to the union all transformations. Use the union all transformation instead of the merge transformation in the following situations.

  • The transformation inputs are not sorted
  • The combined output does not need to be sorted.
  • The transformation has more than two inputs.
MERGE takes ONLY TWO inputs where as UNION ALL can take more than two. Now lets see how to configure MERGE transformation with an example.
I created TWO tables with names MergeA and MergeB and inserted few records into each table as shown below.
  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object. Make sure the Data Flow Page is opened as shown below.

  • Select TWO OLE DB data sources from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string to the TWO tables as shown below.
  • Set the connecteions to the TWO tables created using TWO OLE DB data sources.
  • As said earlier, we need to SORT the data before giving as input to the MERGE transformation and hence drag and drop TWO SORT transformations into Data Flow pane and provide output of each OLE DB source to each Sort transformation as shown below.
  • Open the SORT configuration and select the Checkbox as EmpNo which means the SORT will happen on EmoNo column a s shown below and apply the same in both the SORT transformations. Also provide sort type as either Ascending or Descending as per your requirement.
  • The Input data from both the sources is SORTED now and hence add MERGE transformation to the pane and provide OUTPUT of both sort transformations as input to MERGE transformation as shown below.
  • Now drag and drop Flat File Destination to see the output on file, make connection between the Merge and the Flat File Destination as shown above.
  • Double click on the Flat File Destination to configure, In CONNECTION MANAGER select the NEW Option to set file path of the file and Click OK. If you select a file which already Exists then it will take that file else a NEW file will be created.
  • Check the mappings between Available Input Columns and Available Destination Columns and click OK.
  • Now the Package is ready which pulls the data from TWO tables then sorts and then mergers the data that is coming from two sources before copying to destination file. Trigger the packages and make sure all turns to GREEN.
  • Now open the file and see the data copied into the file which is coming from TWO sources.
In the above example I have taken OLE DB sources and Flat file destination to explain MERGE transformation. You can use any SOURCE and DESTINATION types depending on your requirements. The key things to keep in mind while using MERGE transformation –
  • Same number of columns should be there in both the sources.
  • Same data types should be present for mapping columns.
  • Data should be sorted before giving as input to MERGE.
That’s it .. as simple as this .. Try and Enjoy .. 🙂

 

Regards,
Roopesh Babu V

 

Derived Column Transformation in SSIS with Example

23

Friends,

SSIS has many transformations tasks and Derived column is one of them. This is used to derive a new columns by using any of the following.

  • Constants or static values
  • Other columns
  • Variable values

This is powerful because we can generate new column run time and can make use of the same. Lets see couple of examples in this post where we can use DERIVED COLUMN transformation.

We are gonna create a table with the columns Name, Salary and Bonus and we will use our SSIS package to generate a new file which will have derived field Gross salary(Sum of Salary and Bonus) and also display the name field in UPPER CASE. PFB the steps followed.

  • First create a table and insert data into it as show below.

  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object. Make sure the Data Flow Page is opened as shown below.

  • Select OLE DB data source from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string as shown below.

  • Drag and drop Derived Column Transformation in to the Data Flow Pane and provide the connection between OLE DB Source and Derived column Transformation and open the properties pane by simply double clicking Derived Column transformation.
  • In the Derived Column Transformation Editor expand the columns folder and drag and drop Column 1 as shown below.

  • Use the function UPPER to convert the data of the desired column to UPPER CASE as shown above. This is as simple as this to add a new column which converts data of a column to UPPER CASE.
  • Now set the destination as explained in the post here. Once destination is set the Execute the Package.
This is it !! As simple as this. The power of this will come when we make use of DIFFERENT FUNCTIONS available in Derived Column transformation. We can use all kinds of Arithmetic,String,Date Functions to achieve your required output. You need to use EXPRESSION column to define the Derived Column expression. You can use all the functions shown in below pic.

This is it Guys ! Hope you understood the concepts. Happy Coding !!

Regards,
Roopesh Babu V

 

Sort Transformation in SSIS with Example

86

Friends,

This is one of the transformations which doesn’t need much explanation. Sort is used to SORT the input data in ascending or descending order. You can apply sort on multiple columns of the input data. Let us see an example of the same.

  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object. Make sure the Data Flow Page is opened as shown below.

 

  • Select OLE DB data source from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string as shown below.
  • Set the connection to the database by providing the Server name,database name and authentication details if required.
  • After the connection is set, select Data Access mode as “Table or View” as shown below and then select the table which we created just now.

  • Drag and drop Sort Transformation into the Data Flow pane and provide connection between OLE DB Source and Multicast Transformation.
  • Now select the transformation and edit it for setting sort properties. Select the column names on which we are gonna sort the data.

  • You can use Sort Type option to SORT the data either in Ascending or in Descending order. Sort Order is set to give an order to the columns when we have more than one in SORT transformation.
  • Now drag and drop OLE DB Destination and give the output of Sort transformation and as input to the OLE DB destinations, then EDIT the destination transformation to set the connection properties.
  • Once the Destination is SET then trigger the package and you can see the data ordered in the order specified in SORT Transformation.
  • If we have same value more than once and if you wish to eliminate that duplicate then you can make use of the option “Remove rows with Duplicates values” in Sort transformation.

  • Now if you trigger the package, you can see the duplicates will get removed as shown below.

This is how we can use Sort transformation to SORT the data in any order. You can also hide the duplicate records if we have any in the column specified in Sort Transformation.

Happy Coding !!!

Regards,
Roopesh Babu V

Multicast Transformation in SSIS with Example

479

Friends,

Multicast transformation is useful when ever you wish to make many copies of same data or you need to move the same data to different pipelines. It has one input and many outputs. I will replicate the same input what is takes and returns many copies of the same data. Lets take a simple example to demonstrate the same.

For this I am creating a table and inserting data into it. If you wish to use existing table then you can go ahead with that.

Note – I am creating 5 more tables with the same structure which I will use as Destination tables.

create table emp
(
emp_name varchar(100),
emp_id int,
sal money,
desig varchar(50),
joindate datetime,
enddate datetime
)

Inserting data into the same table.

insert into emp values(‘sita’,123,30000,’software engineer’,12-03-2011)
insert into emp values(‘ramu’,345,60000,’team lead’,15-06-2009)
insert into emp values(‘lakshman’,567,25000,’analyst’,18-02-2011)
insert into emp values(‘charan’,789,40000,’administrator’,27-05-2011)
insert into emp values(‘akhil’,234,30000,’software engineer’,24-07-2011)
insert into emp values(‘kaveri’,456,50000,’hr’,26-12-2009)
insert into emp values(‘nimra’,678,50000,’adminidtrator’,19-06-2010)
insert into emp values(‘lathika’,891,35000,’system analyst’,23-05-2010)
insert into emp values(‘yogi’,423,70000,’tech lead’,12-09-2009)
insert into emp values(‘agastya’,323,70000,’team lead’,23-04-2008)
insert into emp values(‘agastya’,235,50000,’hr’,21-12-2009)

Now the source data is ready on which we can apply Multicast transformation and move to multiple destinations. PFB the steps to be followed.

  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object. Make sure the Data Flow Page is opened as shown below.

 

  • Select OLE DB data source from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string as shown below.

  • Set the connection to the database by providing the Server name,database name and authentication details if required.
  • After the connection is set, select Data Access mode as “Table or View” as shown below and then select the table which we created just now.

  • Drag and drop MultiCast Transformation into the Data Flow Pane and provide connection between OLE DB Source and Multicast Transformation as shown below.

  • Now from Multicast we can take many outputs and to demostrate the same drag and drop OLE DB Destination and give a connection between MultiCast transformation and OLE DB destination then EDIT the destination transformation to provide the connection details to ONE of the FIVE tables created with same structure.
  • Check the mappings and click OK.

  • Like above, we can create N number of destinations and for this demo purpose I created 5 similar destinations pointing to different tables we created as shown below.

  • In the above pic, you can see that Many outputs are coming from one Multicast transformation and the same is pointed to different OLE DB destinations. Now the package is all set to go. Trigger it and wait till all the items turn GREEN.

That is it !! you can see Multicast taking one input of 11 rows and converting it to many copies and sending to different destinations.

This is how you can configure and use Multicast transformation. You can give output of Multicast to another transformation too before passing it to destination.

Happy Coding guys !!

Regards,
Roopesh Babu V