Row Sampling Transformation in SSIS with Example

34
32733

Friends,

This is very much similar to Percentage Sampling with only difference that Row sampling will Limit records by the integer value we passed where as Percentage Sampling takes the %ge of records.

If 1000 records in my source then

Row Sampling – If I take 10 as value then output is 10 records

Percentage Sampling – If I take 10 as value then 10% of 1000 i.e 100 records will be the output.

Lets see an example. You wil feel like you are going through the same Percentage Sampling again if you already read Percentage Sampling.

  • Open a new project and drag a Data Flow task from toolbox in Control Flow.
  • Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object.
  • Make sure the Data Flow Page is opened as shown below.

  • Select OLE DB data source from data flow sources and drag and drop it in the data flow.
  • Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
  • Select the connection manager and click on new button to set the connection string as shown below.

  • Set the connection to the database by providing the Server name,database name and authentication details if required.
  • After the connection is set, select Data Access mode as “Table or View” as shown below and then select the table which we are gonna use as input to PERCENTAGE SAMPLING Transformation.

  • Now select the columns that needs to be present as part of source by going to Columns Page in OLE DB Data Source as shown below.

  • Now drag and drop Row Sampling transformation and connect OLE DB source output as input to this transformation as shown below.

  • Now edit the Rowsampling transformation and select Number of rows out of total records in the Source table you wants to use as sample by mentioning  it “Number of rows”.
  • Give some meaningful names to Sample Output and Unselected output and use “Use the following random seed” option to get Random values from the source rather than getting TOP records.

  • These are all the properties we can set for Row Sampling transformation. Now lets create couple of destinations to store Sampled output and not sampled output. I have taken OLE DB destination to push Sampled output and Flat File destination to push non sampled output.
  • Now drag the output of Row Sampling transformation to give source to OLE DB destination and it will prompt us to select the INPUT (we have two, one sampled and another one not sampled) and select Sampled output.
  • Select the Non Sampled output to Flat file destination and set the connection settings for both OLE DB and Flat File destinations. (You can see configuring destinations in the post here)
  • Now the package is ready to execute and do the same. Make sure all the items turn GREEN.

  • You can observe the records from source got grouped into two different pipelines based on the Integer we have given.

This is it !! This is one of the simplest transformation(to configure) available in SSIS and useful when ever you wish to limit the records flowing to destination.

Happy Coding !!

Regards,

Roopesh Babu V

34 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

− two = one