Thursday, March 14, 2013

Gotcha: BizTalk WCF-SQL Adapter & Table Variables

By now you probably know that the BizTalk WCF-SQL adapter doesn't like temporary tables in stored procedures when generating metadata (eg, refer to Thiago Almeida's post here http://connectedthoughts.wordpress.com/2011/01/18/wcf-sql-versus-the-old-sql-adapter/).

There are a couple of widely documented work-arounds, the two most common being the use of the FMTONLY setting in SQL Server (as described in Thiago's post) or replacing the use of temporary tables with table variables instead.

If you do decide to head down the path of replacing temporary tables in your stored procedures with table variables, something to be aware of is how variables are scoped within T-SQL.

Unlike other languages you might be familiar with such as C#, variables in T-SQL are scoped to the batch (or in the case of a stored procedure, to the entire stored procedure). So if say you have a loop in your stored procedure, and you declare a variable inside the loop, you may not get the results you expected. Because the scope of the variable is the entire stored procedure, it's not re-initialised each time around the loop (unless you do that explicitly), it's only re-declared - and in T-SQL, that means if it's already been declared the first time, there's nothing to do.

Where this ties in to the WCF-SQL adapter and the approach of replacing temporary tables in stored procedures with table variables is that a common scenario you may have is a T-SQL loop, where inside the loop's block you declare a temporary table, do some stuff, and at the end of the block you drop the temporary table (prior to the next iteration of the loop).

Unfortunately, when you replace this with a table variable instead, you end up with a similar statement to create the table variable at the start of the loop's block (replacing the CREATE TABLE... with a DECLARE @table TABLE... instead) - but there's no explicit way (at least that I know of) to do the equivalent of a DROP TABLE at the end of the block. If you combine this with T-SQL's scoping rules though, you can end up with some unexpected results.

For example:
 

Hmmm... If I was coming from a C# background, I might have expected the results to be somewhat different, more along the lines of a single record containing the iteration number each time around the loop. That's certainly what we would have got with the previous approach using create/drop of temporary tables.

The results are due to the @table table variable being re-declared but not re-initialised each time around the loop. So, how can we produce the behaviour of the previous temporary table approach without the ability to explicitly drop ("undeclare") the table variable? Well, the best I've come up with is to declare the table variable, and then immediately following it to include a DELETE statement to delete any records in the table variable. The first time around the loop of course this does nothing, but on subsequent iterations it deletes any records still present in the table variable from the previous iteration.

Using this approach we get the results expected:


HTH!

Tuesday, March 12, 2013

Using a custom SSIS script component source to process NetCDF files

In case you were interested...

Recently I was involved in a data warehousing project with a number of SSIS-based ETL processes loading data into a dimensional model in the data warehouse from a variety of data sources. One of these data sources was weather condition forecast data (daily minimum & maximum temperatures, average rainfall) sourced from a government bureau in NetCDF format on a daily basis.

Prior to this project I'd never even heard of NetCDF before, but nevertheless we needed to be able to work with data in this format as part of our ETL processes. We discovered the Microsoft SDS libraries are the preferred mechanism for interacting with data in NetCDF format from the .NET platform. Although these libraries are part of a Microsoft Research initiative and may not be used for commercial use, we obtained permission from Microsoft to use the libraries for our purposes.
 

We considered the following alternatives:
  1. Custom .NET web service leveraging the Microsoft SDS libraries (T-SQL INSERT)
  2. Custom .NET web service leveraging the Microsoft SDS libraries (SqlBulkCopy)
  3. Custom SSIS script component source leveraging the Microsoft SDS libraries
Originally it was proposed to utilise the Custom .NET web service leveraging the Microsoft SDS libraries (T-SQL INSERT) alternative, with a SSIS package downloading and extracting each forecast NetCDF file prior to invoking the web service for subsequent processing.

A prototype ASP.NET (ASMX) web service was constructed with a single LoadNetCDF service operation. The service used the Microsoft SDS libraries to read the source NetCDF file and dynamically constructed a T-SQL INSERT statement to insert the data into a specified table within a specified database on a specified SQL Server instance.

While this alternative provided a potentially reusable web service that could fit within a service-oriented architecture, it introduced an additional dependency upon the infrastructure required to host the web service. Additionally during testing the web service was observed to perform relatively poorly, with the SSIS package unable to be configured with a large enough timeout to await the web service response (> 300 seconds).

As a result, a number of additional alternative approaches were considered.

The Custom .NET web service leveraging the Microsoft SDS libraries (SqlBulkCopy) alternative retained the usage of the custom web service from SSIS, simply replacing the use of dynamic T-SQL INSERT statements with usage of the .NET SqlBulkCopy class instead. Rather than connecting to the database and executing an INSERT statement for each candidate record extracted from the source NetCDF file, this approach progressively constructed an in-memory .NET DataTable containing all of the data extracted from the source NetCDF file, prior to using the .NET SqlBulkCopy class to perform a bulk insert of the data into the database. This alternative exhibited significantly improved performance (from > 300 seconds to < 30 seconds).

The Custom SSIS script component source leveraging the Microsoft SDS libraries alternative replaced the SSIS web service call with a standard SSIS data flow task. The data flow task was responsible for extracting the forecast data from its source NetCDF file and inserting it into the target database. The key to this alternative was the use of a SSIS script component as the data source for the data flow task. The script component accepted a single variable parameter containing the path to the source NetCDF file. It then utilised .NET code extracted from the prototype web service to read data from the NetCDF file using the Microsoft SDS libraries and write this data to a SSIS output buffer, making it available for stream-based consumption within the data flow task. This alternative also exhibited significantly improved performance (from > 300 seconds to ≈ 10 seconds).

The Custom SSIS script component source leveraging the Microsoft SDS libraries alternative was selected as the preferred alternative as it removed the infrastructure dependency for hosting the custom web service and significantly improved performance. In addition, although the web service approach may have formed a potentially reusable component within a service-oriented architecture, the design of the prototype web service did not meet the required levels of reusability typically associated with such goals and it was not within the scope of the project to provide such.