Change Data Capture in SQL Server by Srikanth Manda

By

-

December 31, 2014

Friends,

Before Microsoft introduced Change Data Capture in SQL Server, developers used to create custom solutions using DML Trigger and additional tables (Audit Tables) to track the data which we have modified. DML Triggers are very expensive and executed as part of our transaction which will degrade the performance of the project or server. By creating DML Triggers, we will be able to track the changes in the data. To track the changes, we need to create additional tables with similar columns to store the changes.

Drawbacks:
1) Takes time in developing/creating DML triggers and additional tables.
2) Performance hit.
3) Very complex process.

We need to know what records are being inserted, updated and deleted in one or more SQL Server tables? Microsoft has come up with new feature called Change Data Capture. We will focus on how to implement change data capture and how to review the captured information to produce an audit trail of the changes to a database table.
When you enable Change Data Capture on the database table, a shadow of the tracked table is created with same column structure of existing table, with few additional columns to summarize the nature of the change in the database table row.
Once you enable change data capture, a process is automatically generated and scheduled to collect and manage the information. By default change data capture information is only kept for 3 days.
Enabling Change Data Capture on a Database
Change Data Capture is table level feature. It has to be enabled on the each table to track the changes. Before, enabling on the table need enable the Change Data Capture on the Database.
To Check whether Change Data Capture is enabled on the Database, run the below script.

USE MASTER
select name,database_id,is_cdc_enabled from sys.databases

You can run this script to enable CDC at database level. (The following script will enable CDC in ChangeDataCapture database. )
USE ChangeDataCapture
GO
EXEC sys.sp_cdc_enable_db
GO

Check whether CDC is enabled on the “ChangeDataCapture” Database

Once CDC is enabled on the Database. Some of the system tables will get created in the database as part of cdc Schema.

The table which have been created are listed here.
• cdc.captured_columns – This table returns result for list of captured column.
• cdc.change_tables – This table returns list of all the tables which are enabled for capture.
• cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
• cdc.index_columns – This table contains indexes associated with change table.
• cdc.lsn_time_mapping – This table maps LSN number and time.
Additionally, in the ChangeDataCapture Database. You will see the schema CDC get created.

Creating a table:
USE ChangeDataCapture

Create Table dbo.Employee
(
EmpId BigInt Primary Key,
EmpName Varchar(50),
EmpSal Decimal(18,2),
EmpDeptNo Int
)

use ChangeDataCaputre
insert into dbo.employee values(1,’sreekanth’,1000,10)
insert into dbo.employee values(2,’sagar’,2000,20)
insert into dbo.employee values(3,’bala’,3000,30)
insert into dbo.employee values(4,’rama’,4000,10)
insert into dbo.employee values(5,’sudhakar’,5000,20)
insert into dbo.employee values(6,’ramana’,6000,30)
insert into dbo.employee values(7,’ravi’,7000,10)
insert into dbo.employee values(8,’satyadev’,8000,20)
insert into dbo.employee values(9,’venkat’,9000,30)
insert into dbo.employee values(10,’prashanth’,10000,10)

USE ChangeDataCapture
select * from dbo.Employee

Enabling Change Data Capture on one or more Database Tables:
The CDC feature can be enabled for table-level, once the CDC is enabled for database. It has to be enabled for any table which needs to be tracked. First run following query to show which tables of database have already been enabled for CDC.
Check Whether CDC is enabled on the Employee Table

USE ChangeDataCapture
Select name,object_id,is_tracked_by_cdc from Sys.tables

From the above image, we can know that CDC is not enabled on the table.
To Enable CDC on the Table
You can run the following stored procedure to enable each table. Before enabling CDC at the table level, make sure SQL Server Agent Jobs is in running mode. When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database, and executed using SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.
• Additionally, it is very important to understand the role of the required parameter @role_name. @role_name is a database role which will be used to determine whether a user can access the CDC data; the role will be created if it doesn’t exist. You can add users to this role as required; you only need to add users that aren’t already members of the db_owner fixed database role.
Run the below script to enable CDC on the table dbo.Employee.
USE ChangeDataCapture
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@role_name = NULL
GO

In the Current Context, When we are enabling CDC on the table. System is throwing error stating
SQL Server Agent is not currently running.

First, we need to start the SQL Server Agent. Then we need to enable the CDC on the table.

Run the fallowing script to enable CDC on the table dbo.Employee.
USE ChangeDataCapture
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@role_name = NULL
GO

The sys.sp_cdc_enable_table system stored procedure has parameters. Let’s describe each one (only the first three parameters are required; the rest are optional and only the ones used are shown above):
• @source_schema is the schema name of the table that you want to enable for CDC
• @source_name is the table name that you want to enable for CDC
• @role_name is a database role which will be used to determine whether a user can access the CDC data; the role will be created if it doesn’t exist. You can add users to this role as required; you only need to add users that aren’t already members of the db_owner fixed database role.
• @supports_net_changes determines whether you can summarize multiple changes into a single change record; set to 1 to allow, 0 otherwise.
• @capture_instance is a name that you assign to this particular CDC instance; you can have up two instances for a given table.
• @index_name is the name of a unique index to use to identify rows in the source table; you can specify NULL if the source table has a primary key.
• @captured_column_list is a comma-separated list of column names that you want to enable for CDC; you can specify NULL to enable all columns.
• @filegroup_name allows you to specify the FILEGROUP to be used to store the CDC change tables.
• @partition_switch allows you to specify whether the ALTER TABLE SWITCH PARTITION command is allowed; i.e. allowing you to enable partitioning (TRUE or FALSE).

Once we enable Change Data Capture on the table, it creates the SQL Server Agent Jobs with following names.
1. cdc. ChangeDataCapture _capture – When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job. The procedure sys.sp_cdc_scan is called internally by sys.sp_MScdc_capture_job. This procedure cannot be executed explicitly when a change data capture log scan operation is already active or when the database is enabled for transactional replication. This system SP enables SQL Server Agent, which in facts enable Change Data Capture feature.
2. cdc. ChangeDataCapture _cleanup – When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables.

When everything is successfully completed, check the system tables again and you will find a new table called cdc. dbo_Employee_CT . This table will contain all the changes in the table dbo.Employee. If you expand this table i.e; cdc. dbo_Employee_CT , you will find five additional columns as well.
As you will see there are five additional columns to the mirrored original table
• __$start_lsn
• __$end_lsn
• __$seqval
• __$operation
• __$update_mask
There are two values which are very important to us is __$operation and __$update_mask.
Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.
• _$operation = 1 i.e; Delete
• _$operation = 2 i.e; Insert
• _$operation = 3 i.e; Values before Update
• _$operation = 4 i.e; Values after Update
The column _$update_mask shows, via a bitmap, which columns were updated in the DML operation that was specified by _$operation. If this was a DELETE or INSERT operation, all columns are updated and so the mask contains value which has all 1’s in it. This mask is contains value which is formed with Bit values.
Example of Change Data Capture
We will test this feature by doing DML operations such as INSERT, UPDATE and DELETE on the table dbo.Employee which we have set up for CDC. We will observe the effects on the CDC table cdc.dbo_Employee_CT.
Before we start let’s first SELECT from both tables and see what is in them.
USE ChangeDataCapture
select * from [dbo].[Employee]
GO

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]
GO

Insert Statement:
Let us execute Insert Operation on the dbo.Employee Table
USE ChangeDataCapture

insert into [dbo].[Employee] values (11,’Krishnaveni’,11000,20)
insert into [dbo].[Employee] values (12,’Mahathi’,12000,30)
insert into [dbo].[Employee] values (13,’Suma’,13000,10)
insert into [dbo].[Employee] values (14,’Jabeen’,14000,20)
insert into [dbo].[Employee] values (15,’Ambily’,15000,30)

Once the Insert Script is executed, let us query both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]
GO

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]
GO

Because of the INSERT operation, we have a newly inserted five rows in the tracked table dbo.Employee. The tracking table also has the same row visible. The value of _operation is 2 which means that this is an INSERT operation.

Update Statement:
In the Update Operation, we will update a newly inserted row.
USE ChangeDataCapture

Update dbo.Employee
set
EmpName = ‘Sumala Yeluri’
where
EmpId = 13

After executing the above script, let us query content of both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]
GO

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]
GO

On execution of UPDATE script result in two different entries in the cdc.dbo_Employee_CT tracking table. One entry contains the previous values before the UPDATE is executed. The second entry is for new data after the UPDATE is executed. The Change Data Capture mechanism always captures all the columns of the table unless, it is restricted to track only a few columns.
Delete Statement:
In this Delete Operation Scenario, we will run a DELETE operation on a newly inserted row.
USE ChangeDataCapture

Delete from
[dbo].[Employee]
where
EmpId = 15

Once again, let us check the content of both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]
GO

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]
GO

Due to the DELETE operation, one row got deleted from table dbo.Employee. We can see the deleted row visible in the tracking table cdc.dbo_Employee_CT as new record. The value of _operation is 4 , meaning that this is a delete operation.

Disabling CDC on a table:
In order to enable CDC, we have to do this in two steps – at table level and at database level. Similarly, if we want to disable , we can do it in two levels.
Let’s see one after other.
In order to disable Change Data Capture on any table we need three values the Source Schema, the Source Table name, and the Capture Instance. In our case, the schema is dbo and table name is Employee, however we don’t know the Capture Instance. To Know Capture Instance, run the following script.
USE ChangeDataCapture;
GO
EXEC sys.sp_cdc_help_change_data_capture
GO
this will return a result which contains all the three required information for disabling CDC ona table.

This System Procedure sys.sp_cdc_help_change_data_capture provides lots of other useful information as well. Once we have name of the capture instance, we can disable tracking of the table by running this T-SQL query.

USE ChangeDataCapture;
GO
EXECUTE sys.sp_cdc_disable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@capture_instance = N’dbo_Employee’;
GO

Once Change Data Capture is disabled on any table, it drops the change data capture table, functions and associated data from all the system tables.
From the above Screenshot , we can see that system capture table cdc.dbo_Employee_CT is dropped.

Disable CDC on Database:
Run following script to disable CDC on whole database.
USE ChangeDataCapture
GO
EXEC sys.sp_cdc_disable_db
GO

Above Stored Procedure will delete all the data, system related functions and tables related to CDC. If there is any need of this data for any other purpose, you must take a backup before disabling CDC on any database.

Automatic Cleaning Process:
As we know if we keep track of data in the database, there would be huge amount of growth in hard drive on the server. This would lead to maintenance issues and input or output buffer issues..
In CDC, there is an automatic mechanism to CleanUp the process that runs at regular intervals or schedules. By default, it is configured for 3 days. We can also enable CDC on the database, System Procedure with sys.sp_cdc_cleanup_change_table which takes care of cleaning up the tracked data at the regular interval.

Hope this helps !!

Best Regards,
Srikanth Manda

Non-Clustered Column Store Index in SQL Server 2012 by Srikanth Manda

By

roopesh.valluru

-

December 29, 2014

589

Hi Friends,

I would like to add a pretty good article on “Non-Clustered Column Store Index in SQL Server 2012” from my friend “Srikanth Manda”. Hope you will enjoy this.

Everyone agree the fact that hardware speed and capacity has increased past two or three decades, but disk I/O (Input/Output) or disk access or data transfer rate has not grown up to the expected level and is still the slow. One key point to remember, as time is moving forward the size of the database become larger and larger. Data present now would increase by almost 10 times in next 2 to 3 years from no. We have provide a technology in SQL Server which can be addressed this kind of data growth with Data Warehouses. Secondly, the size becomes bigger the query performance is also very critical. Customers would like to have a response like a inter active, they want to have large amount of data, they want to process the data and get the results in the query like attractive fashion. Thirdly, that we are seeing is Data Warehouse has become more like a commodity and provide Data Warehouse technology to masses. Finally, the amount of data in data warehouse (DWH) is growing tremendously day by day. When you want to retrieve (Query) data from Data Warehouse, it takes quite huge amount of time. This would degrade the performance of the Data Warehouse. All these issues can be addressed by Non-Clustered Column Store Index.
In the Article, We will learn about this new feature, how can we build this, how it is in SQL Server, how exactly the data is stored, what happens underneath the engine, how this improves performance of Data Warehousing Queries.
In any traditional relational DBMS, the data is stored as rows (B-Tree format). Like, Microsoft SQL Server stores rows in a page of size of 8 K. If you have a row of 10 columns, you store Row 1 , Row 2 and when page becomes full the page 8 K, then Row goes to second page and so on. This is how the data is stored, successfully formats and successfully for OLTP Workloads. For example consider the image below the data for ten columns for each row gets stored together contiguously on the same page and once the data is full and the row goes to second page.

What has changed is, instead of storing data in the row format other way to look out is can I store data in the Column Store format. For example, I have a table with C1 to C10 columns, instead of storing as rows will store as columns. Then we have storage as Column C1, C2… C10. When we store data in the column store format, we get very good compression. The reason is data from same column is of same type and domain, so it compresses very well. For example, A company is operating globally throughout the world. All the employees from India, there mention the Country as India. Similar, employee from US would mention as ‘US’ as Country. Here, Column with Country would be compressed because it is a repetitive pattern. This kind of opportunity is available in Column Store Format rather than Row Store Format.
In the Row Store Format, data stored for all ten columns C1, C2, C3, …., C10. If we want to retrieve only columns like C1,C3,C5. What happens in the Row Store Format is we need read/fetch data for the entire row of 10 columns then predicate is applied for the specified columns. But, in case of Column Store Format, we can fetch only the required columns i.e.; Columns C1,C2,C3 etc. In this case, it reduces I/O and data fits in memory with which you get much improved performance. You can improve how the query is processed using Column Store technology that gives much better response time.
If we create Non-Clustered Column Store Index, the data is stored in column format.

If we store data in column format, suppose we store 10 million rows, we cannot store all 10 million rows of column C1 as storage unit. What we do is we break those rows into smaller chunks, which we call have as row group.

We have grouped the rows of 1 million; call it as Row Group Chunk. In each Row Group which has 10 columns here and each column is stored in its segment. It would be 10 segments. The benefit of storing each column in segment, when I want to rows of columns C1, C2, then I just get segment for column C1, segment for column C2.

Note: Blue color box are nothing but segments.
Important Points to remember:
1) Row group
• set of rows (typically 1 million)
2) Column Segment
• Contains values from one column from row group
3) Segments are individually compressed
4) Each segment stored separately as LOB’s as Binary Format
5) Segment is unit of transfer between disk and memory

New Batch Processing Mode
1) Some of the more expensive operators(Hash Match for joins and aggregations) utilize a new execution mode called Batch Mode
2) Batch mode takes advantage of advanced hardware architectures, processor cache and RAM improves parallelism
3) Packets of about 1000 rows are passed between operators, with column data represented as a vector
4) Reduces CPU usage by factor of 10(sometimes up to a factor of 40)
5) Much faster than row-mode processing
6) Other execution plan operators that use batch processing mode are bitmap filter, filter, compute scalar
7) Include all columns in a ColumnStore Index

Batch Mode restrictions:
1) Queries using OUTER Join directly against ColumnStore data, NOT IN (Sub query), UNION ALL won’t leverage batch mode, will revert to row processing mode
Examples:
1) In this Demo, Creating two tables i.e.; one with regular index and other with Non-Clustered ColumnStore Index. Below is the scrip to create two tables
Table with Regular Index
CREATE TABLE [dbo].[FactInternetSalesWithRegularIndex](
[DummyIdentity] [int] IDENTITY(1,1) NOT NULL,
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[OrderQuantity] [smallint] NULL,
[SalesAmount] [money] NULL
CONSTRAINT [PK_FactInternetSalesWithRegularIndex_ProductKey_OrderDateKey]
PRIMARY KEY CLUSTERED
(
[DummyIdentity] ASC,
[ProductKey] ASC
)) ON [PRIMARY]

Table with Non-Clustered ColumnStore Index

CREATE TABLE [dbo].[FactInternetSalesWithColumnStoreIDX](
[DummyIdentity] [int] IDENTITY(1,1) NOT NULL,
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[OrderQuantity] [smallint] NULL,
[SalesAmount] [money] NULL
CONSTRAINT [PK_FactInternetSalesWithColumnStoreIDX_ProductKey_OrderDateKey]
PRIMARY KEY CLUSTERED
(
[DummyIdentity] ASC,
[ProductKey] ASC
)) ON [PRIMARY]

GO

2) Insert data into both tables. Here is the insert script
Insert Script for FactInternetSalesWithRegularIndex Table
INSERT INTO FactInternetSalesWithRegularIndex
(
ProductKey, OrderDateKey,
OrderQuantity,SalesAmount
)
SELECT
ProductKey,OrderDateKey,
OrderQuantity,SalesAmount
FROM [AdventureWorksDW2012].dbo.[FactInternetSales]

GO 50

Insert Script for FactInternetSalesWithColumnStoreIDX Table
INSERT INTO FactInternetSalesWithColumnStoreIDX
(
ProductKey, OrderDateKey,
OrderQuantity,SalesAmount
)
SELECT
ProductKey,OrderDateKey,
OrderQuantity,SalesAmount
FROM [AdventureWorksDW2012].dbo.[FactInternetSales]

GO 50

3) And finally I want to create a regular non-cluster index (on ProductKey and Salesamount columns) on the first table, and column store index on the second table, which will include ProductKey and Salesamount columns.

CREATE NONCLUSTERED INDEX [NC_FactInternetSalesWithRegularIndex_ProductKey_Salesamount]
ON FactInternetSalesWithRegularIndex
(ProductKey,Salesamount)
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX [CS_FactInternetSalesWithColumnStoreIDX_ProductKey_Salesamount]
ON FactInternetSalesWithColumnStoreIDX
(ProductKey,Salesamount)

GO

4) Execution of Queries

When I ran the query with STATISTICS IO ON, I found stunning results (with significant performance) of using column store index vs regular index, as you can see below:

SET STATISTICS IO ON

Select ProductKey,sum(Salesamount)
from FactInternetSalesWithRegularIndex
GROUP BY ProductKey
ORDER BY ProductKey

Select ProductKey,sum(Salesamount)
from FactInternetSalesWithColumnStoreIDX
GROUP BY ProductKey
ORDER BY ProductKey

SET STATISTICS IO OFF

Result:

(158 row(s) affected)
Table ‘FactInternetSalesWithRegularIndex’. Scan count 5, logical reads 4339, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1342 ms, elapsed time = 504 ms.

(158 row(s) affected)
Table ‘FactInternetSalesWithColumnStoreIDX’. Scan count 4, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 27 ms.

Even the time required to run these two queries greatly varied, the queries with regular index took 1342 ms for CPU cycle and 504 ms as elapsed time vs just 47 ms for CPU cycle and 27 ms as elapsed time for the second query, which uses column store index

The relative cost of the second query (which uses column store index) is just 11% as opposed to the relative cost of first query (which uses regular index) which is 89%.

For column store index exclusively, SQL Server 2012 introduces a new execution mode called Batch Mode, which processes batches of rows (as opposed to the row by row processing in case of regular index) that is optimized for multicore CPUs and increased memory throughput of modern hardware architecture. It also introduced a new operator for column store index processing as shown below:

Restrictions:
1) Cannot be clustered
2) Cannot act as PK or FK
3) Does not include sparse columns
4) Can’t be used with tables that are part of Change Data Capture or FileStream data
5) Cannot be used with certain data types, such as binary, text/image, row version /timestamp, CLR data types (hierarchyID/spatial), nor with data types Created with Max keyword eg: varchar(max)
6) Cannot be modified with an Alter – must be dropped and recreated
7) Can’t participate in replication
8) It’s a read-only index
a. Cannot insert rows and expect column store index be maintained

What’s New in SQL Server 2014
Columnstore index has been designed to substantially increase performance of data warehouse queries, which require aggregation and filtering of large amounts of data or joining multiple tables (primarily performs bulk loads and read-only queries).
There were several limitations in SQL Server 2012, SQL Server 2014 overcomes them:
1) We can create only one non-clustered column store index which can include all or few columns of table in a single index on a table.
2) SQL Server 2014 has come up with an enhancement of creating Clustered Column Store Index.
3) SQL Server 2012, when we create a Non Clustered Column Store index then it makes table read only.
4) With SQL Server 2014, you can create a Clustered Column Store Index without any impact on the insertion on table. You can issue some INSERT, UPDATE, DELETE statements with a table with clustered column store index. No more workaround is required for writing data to a table with Non Clustered Column Store Index like drop the existing one and re-create the index.

Hope you enjoyed the post ..
Thanks,
Srikanth Manda

SSAS DBA Tips & Tricks – How to change the default data folder location in SSAS

By

roopesh.valluru

-

April 26, 2014

1410

Friends – One of the Very helpful property of SSAS is discussed in this video. The data folder needs to be moved to a new loaction in few cases and the same is discussed in this post.

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – Where the data will be stored in MOLAP Storage mode

By

roopesh.valluru

-

April 26, 2014

1172

Friends,

In this post/video, we are gonna explore DATA folder and it’s uses in SSAS. Hope this will help you in understanding the concept MOLAP Storage Mode very clearly.

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – How to delete Huge Cube Databases

By

roopesh.valluru

-

April 20, 2014

26

Friends – If you wish to know How I delete huge SSAS cube databases, then jump into the below given video.

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – How to copy ONLY Structure or Schema of Cube database to new server

By

roopesh.valluru

-

April 19, 2014

600

Friends –

To copy ONLY Structure or Schema of Cube database to new server, we need solution file. But, without solution you can do it using the DB deployed to one server and XMLA. check out how.

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – Use of REPLAY option in Sql Server Profiler Trace

By

roopesh.valluru

-

April 19, 2014

792

Friends –

REPLAY option in Sql Server Profiler Trace is one of my favorite options. This options helps you to replicate load of once server to the another and check out how ..

Regards,

Roopesh Valluru

SSAS DBA Tips & Tricks – How to ProcessAdd or ProcessIncremental of Dimension from SSMS

By

roopesh.valluru

-

April 19, 2014

840

Friends – When you try to ProcessAdd of dimension from SSAS, you will not find any such option. Is it really available for us from SSMS or not ? Check it out here ..

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – SSAS database backup using XMLA

By

roopesh.valluru

-

April 19, 2014

375

Hello Friends,

Back with another video and this time on SSAS cube database backups using XMLA. Generally people prefer to use Wizard to take backups of databases but option using XMLA has more advantages over using Wizard. This is very interesting and useful option to know. So, why late .. just jump into the video ..

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – Flight Recorder in SSAS

By

roopesh.valluru

-

April 19, 2014

27

Guys – In this post/video we are gonna discuss about the Flight Recorder and its use in SSAS. I feel most of us are not even aware of this (i came to know recently only) and hence adding this video to share my knowledge which will help you guys.

Regards,
Roopesh Valluru

SSAS DBA Tips & Tricks – How to process multiple dimensions or measures from SSMS

By

roopesh.valluru

-

April 11, 2014

373

Friends,

In this Post/Video we are gonna discuss about HOW TO PROCESS MULTIPLE DIMENSIONS AT A TIME FROM SSMS. This might be a very simple topic but very useful one to know for SSAS developers. Let me add couple of points to those who love to READ than WATCHING video .. 🙂

Connect to the Analysis services using SSMS as shown below.

Select any dimension of the cube for which you want to process multiple dimensions as shown below.

Select Object Explorer Details options from View menu.

Once the details of the dimensions open then you can select multiple dimensions to process at a time. Please watch the video of the same below

[youtube]N9XePaWB90k[/youtube]

Hope this helps you guys !! Do post a comment if this helped you ..
Regards,
Roopesh Valluru

Sort SSAS dimension attribute based on another attribute with example

By

roopesh.valluru

-

April 9, 2014

103

Guys – In this post/video we are gonna discuss about the steps to be followed to use different attribute in ORDERBY property of a dimension’s attribute.

Problem – How to use Attribute Key in OrderBy property of ssas dimension’s attribute.

Regards,
Roopesh Valluru

How to implement OR condition in MDX

By

roopesh.valluru

-

March 27, 2014

143

Friends,

This post explains about “How to implement OR condition in MDX”. This is a youtube video and if you are unable to see the same then youtube might be blocked in your company .. 😉

Best Regards,

Roopesh Valluru

Distinct values for SSRS Parameter when query returns Duplicates

By

roopesh.valluru

-

December 13, 2013

104

Friends,

This issue is very common when you are pulling data from Sharepoint list to load parameter values. In this post let us discuss the solution for the same.

Step 1 : Create a report and Go to the Report Tab at the top.

Step 2 : Go to the Report properties and then code and write the following code in it.

Public Function RemoveDuplicates(parameter As string) As String

Dim value as String

Dim items As Object() = Split(parameter,”~”)

System.Array.Sort(items)

Dim k As Integer = 0

For i As Integer = 0 To items.Length – 1

If i > 0 AndAlso items(i).Equals(items(i – 1)) Then

Continue For

End If

If items(i) <> “”

if i =0 then

value = items(i)

else

value = value+”,”+items(i)

End If

End Function

Step 3 : Create a Dataset which contain all the values that need to be display in the parameter including duplicates. Let’s name it Color Dataset.

Step 4 : Now create a parameter that should take values from the Color dataset. Make it a hidden parameter. Let’s name it Hidden_Param.

Step 5 : Create another parameter that will use to show only unique values in the report parameter. Let’s name it Main_Param, also select multiple values option.

Use following Expression in the Available Value under it specific values on both the labels Add as well as Value .

=Split(Code.RemoveDuplicates(join(Parameters!Hidden_Param.Value,”~”)),”,”)

Step 6 : Use this Main parameter in your main dataset , Dataset properties’parameter to map it.

Step 7 : See the preview and its done

Thanks Swati for the document !!

Regards,
Roopesh Babu V

MDX Detailed Training Videos by Roopesh Valluru (7 Videos)

By

roopesh.valluru

-

October 20, 2013

102

Friends,

I am bringing together all the MDX Videos in this post. In this post we are gonna discuss about all the MDX topics in details. Hope you will enjoy the journey through MDX in detail. My best Wishes !!

Regards,
Roopesh Babu V

SSRS with MDX Detailed Training Videos by Roopesh Valluru (11 Videos)

By

roopesh.valluru

-

October 20, 2013

108

Friends,

This post is very useful to those who wants to learn SSRS with MDX. I have covered almost all concepts and please do drop a message if you are happie with the tutorials.

SSRS with MDX Detailed Training playlist

My Best Wishes !!

Regards,
Roopesh Babu V

SSAS Detailed Training Videos by Roopesh Valluru (15 Videos)

By

roopesh.valluru

-

October 20, 2013

233

Friends,

I am bringing together all the SSAS Videos. In this post we are gonna discuss about all the SSAS topics in details. Hope you will enjoy the journey through SSAS in detail. My best Wishes !!

Regards,
Roopesh Babu V

IsBig Property of Aggregate Transformation in SSIS 2008 R2

By

roopesh.valluru

-

October 19, 2013

0

Friends,
I dunno how many of you guys are aware of the property IsBig in SSIS Aggregate Transformation. If you are aware of it then just skip this post else worth reading this. As you all aware that using aggregate we can calculate SUM,AVG,COUNT and COUNT DISTINCT of a column or columns. In some cases the return value will be soo big like more than 4 billion and to handle those kind of outputs we need this IsBig Property. This can be set on output columns to invoke special handling of big or high-precision numbers. If a column value may exceed 4 billion or a precision beyond a float data type is required, IsBig should be set to 1. The set the property, follow the steps.

Right click on Aggregate transformationa nd select Show Advanced Editor.
Go to Input and Output columns tab.
Expand the Output columns and select the column to which you wish to set this property.

Using this property will have the following impact on Output data types flowing through Aggregate transformation.

DT_UI8 data type will be used for COUNT results.
DT_UI4 data type will be used for DISTINCT COUNT result.
DT_R8 data type will be used for remaining operations.

As I mentioned above, this can be used only for the above given operations. For Group By,Maximun and Minimum we can’t set this property.
Try it guys !!

Regards,
Roopesh Babu V

Difference between Keys Scale and Number of Keys properties in SSIS Aggregate Transformation

By

roopesh.valluru

-

October 19, 2013

0

Friends,

You might have seen the properties “Keys Scale” and “Number of Keys” in SSIS Aggregate transformation. What is the difference and in which scenario we use these properties ? Like to know, then continue ..

The above two properties are given to improve the performance of the transformation. For the columns with GROUP BY operation in the transformation, we set these TWO properties. Both deals about the number of groups that are expected to result from a Group by operation. For example, if I used Color column from Dimproduct table in AdventureWorks database with GROUP BY operation then by setting these properties we will give a HINT to the server that how many KEYS(Values) are expected. When you specify an appropriate value for Keys or KeyScale, you improve performance because the tranformation is able to allocate adequate memory for the data that the transformation caches. now, what is the difference between these two when both serve the same purpose ?

KeyScale – This will have the following Predefined values. this specifies the approximate number of keys that the aggregation expects. This will take the default of UNSPECIFIED.

Unspecified – Keys Scale value is not specified.
Low – 5 Lakh values appx.
Medium – 50 lakh values appx.
High – 2.5 cr values appx.

Number of Keys – This is an enterable field which accepts the numeric value. If you think the Keys (values) will not be near to either of the above given Key Scales then you can set the value manually using this property.

You need to set any one of the two properties. If both Keys scale and Number of keys are specified, Number of keys takes precedence.
Hope this helps guys !!

Regards,
Rooepsh Babu V

Merge Join Transformation in SSIS 2008R2 Example

By

roopesh.valluru

-

October 19, 2013

47

Merge Join Transformation is one of the useful tool in SSIS to join two sources. The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. Let us see how it works.
You can configure the Merge Join transformation in the following ways:
• Specify the join as FULL, LEFT, or INNER join.
• Specify the columns the join uses.
• Specify whether the transformation handles null values as equal to other nulls.
NOTE: If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.

This transformation has two inputs and one output. It does not support an error output. Let’s discuss the same with an example. We will start out with a connection manager that is created for the Adventure works database

Now drag and drop a Data Flow Task from the toolbox. Now Double click on Data flow Task.

Now we will create an OLE DB Source called “Products”, in which we will select all of the products (see query below).

Double click on the Products OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductID, Name, ProductNumber, and ProductSubcategoryID from the Product table.

Now Create an OLE DB Source called “Product Subcategory”, in which we will select all of the Product Categories .

Now Double click on the Product Subcategory OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductSubcategoryID, and Name from the ProductCategory table.

Now Create an OLE DB Source called “Purchase Order Detail”, in which we will select all of the Details of the Purchase Orders.

Double click on the Purchase Order Detail OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the PurchaseOrderID, ProductID, and UnitPrice from the PurchaseOrderDetail table.

Now we will create two Sort components and join the pipeline from Products to one of the sort transformations and join the pipeline from Product Subcategory to the other sort transformation. Remember that both datasets that you are joining must be sorted the same before joining with a Merge Join.

Now Click on the Sort that we connected to the Products source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.

Click on the Sort that we connected to the Product Subcategory source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.

Now add a Merge Join transformation. This will allow us to join the Products and Product Subcategory sources together. Drag the pipeline arrow from the Products Sort to the Merge Join.

The Input Output Selection window will appear. Select Merge Join Left Input. This will mean that we are using the Products on the Left hand side of the join. If you are familiar with Left and Right joins in SQL this is a familiar concept. Choosing Left Input doesn’t mean we are necessarily doing an outer or inner join (we define that later), it just defines what side of the join this input will be used as.

Now drag the pipeline arrow from the Product Subcategory Sort to the Merge Join. This time it will most likely not prompt you for which side of the join you want to add this input, as we already selected Left Input for the previous input.

Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductSubcategoryID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be “Inner Join”, “Left Outer Join” or “Full Outer Join”. In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Notice that below there is a column called “Output Alias”. This column allows us to rename columns to new names. This allows us to rename “Name” from Products to “ProductName” and renam “Name” from ProductSubcategory to “CategoryName”. So after the Merge Join, this columns will now be known be these alias names. When completed click OK.

Now we will create two Sort components and join the pipeline from the Merge Join to one of the sorts and join the pipeline from the Purchase Order Detail source to the other sort.

Now Click on the Sort that we connected to the Merge Join output and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.

Click on the Sort that we connected to the Purchase Order Detail source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.

Now we will add a Merge Join transformation. This will allow us to join the Results of the first Merge Join and the Purchase Order Detail source together. Drag the pipeline arrow from the sort transformation of the first Merge Join to the Merge Join. The Input Output Selection window will appear. Select Merge Join Left Input. Then drag the pipeline arrow from the Purchase Order Detail sort to the Merge Join.

Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be “Inner Join”, “Left Outer Join” or “Full Outer Join”. In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Click OK hen completed.

Instead of using OLEDB as destination, I used Audit Transformation for testing purpose and used Data viewer to show output data produced by Merge join Transformation. After testing our package successfully we can delete Audit Transformation and Data Viewer and bring our destination and connect to Merge Join transformation for input columns for our destination.

Now Execute the package to see the Results.

Now we can output our joined dataset to any source we desire. I Hope you are now able to successfully join datasets using SSIS.

Hope you will give a try !!

Thanks Aparna for the document.

Regards,
Roopesh Babu V