September 03, 2010

"For successful technology, reality must take precedence over public relations, for nature cannot be fooled."    --  Richard Feynman
Moving Sql Server Code
Author:RBarryYoungCreated:8/11/2008 12:08 AM
My occasional adventures in performance, administration, secuirty and other database topics. -- RBarryYoung

Wow.  I was following a link from a forum post on SQL Injecitn that I was reaidng and it led me here. Wow.  What a great resource!  This is the kind of tutorial that Microsoft should have written years ago.  And then started promoting it to address Injection, and started following it themselves, and encouraged SW vendors to follow.  And encourage customers to require from their SW vendors.

The catch?  It's from ... Oracle.    That also means that the recommended solutions are very Oracle-specific.  The good news?  I did not see anything in there that could not be easily translated to Transact-SQL.  Happy reading!

(trying to get better about just blogging in the moment, instead of saving it up ...)

I saw a post abut the next release of ClearTrace at http://weblogs.sqlteam.com/billg/archive/2009/05/27/ClearTrace-2008.34.aspx that mentioned its heavy use of SQLBulkCopy to speed up Trace loading, and I was reminded about my own experiences with it:

SQLBulkCopy is part of SQLClient and is pretty cool.  I did some performance testing of it last year, trying to match BCP and BULK INSERT's speed with it.  Could only get about 50% as fast because of the amount of CPU time it was spending in type conversions. 

Turns out that both the .Net and the default SQL Server text-to-numeric (and text-to-datetime) conversion routines have a lot of overhead, I suspect to handle the zillions of different text-numeric formats.  Since I knew exactly what my text-numeric formats were I hand coded my own type-conversion routines (in VB no less!) and almost doubled its speed.  I was curious if anyone else had noticed the same thing?

(I have attached the code in a .ZIP file here).

It may seem boring, but this is actually big news.  The US Federal government has just launched a website that allows anyone to download any of a large number of databases from various federal agencies.  The new site is http://www.data.gov/, try it out and let me know what you think!  (Hmm, formatting seems to be less than I would want...).

So I'm a twit.. or a twiteratii, or a twitron, or whatever the digest-sized denzens of Twitter call themselves.  I've resisted the tempatation for a long time because it really seemed like short attention-span blogging and why feed that particular beast?

Then it occured to me that maybe I would find those short tweets (or twix? or twonks?) easier than writing whole paragraphs, so I singed up.  And I have to say, for SQL Server stuff it's twitterlicious, lots of high profile and interesting SQL Server MVPs and experts out there.

So my Twit-Name is (you guessed it) @RBarryYoung.  And has anybody else noticed how much the twitern language resembles Smurfish?  :-)

Just a follow up to me earlier post on reading the SQL 2006 Draft Standards (here), Glenn Pauley of Sybase pointed out to me that you download the standards for much less from the ANSI Store at http://webstore.ansi.org. The ANSI/ISO SQL Standards are all code with "9075" so the easiest way is to use the Document Number search for "IEC 9075".

And the good news is that the SQL:2003 standards are just $30 for each section.  The bad news is that there are 14 sections and the SQL:2008 documents are $180 apeice.

I am developing a T-SQL test harness for automated performance tests. The testing procedure will receive a string to execute as Dynamic SQL. I am going to add a bunch of standard initialization and measurement stuff to it, that's all easy. However, one additional thing that I want to do is to capture the Execution Plan (the actual plan is preferred) into an XML variable or column to be saved as part of the test header record.

I puzzled over how to do this for while this morning, until I came up with the following approach:

Declare @qp as XML

SELECT @qp = query_plan

 From sys.dm_exec_requests

  Cross Apply sys.dm_exec_query_plan(plan_handle)

 Where session_id = @@spid

 

-- Put Code to Test Here

 

select @qp

True, this will include some extra/redundant stuff like the QP for the Test-Harness statments themselves, but I think that I can live with that.

I added my first Suggestion to Microsoft CONNECT yesterday, you can find it here.  Please support this suggestion by rating and/or voting for it.  Thanks.

For convenience, I have copied it below:

Please add a query-based bulk EXECUTE command feature, with following (or similar) syntax:

[WITH [,...n]]
[ { EXEC | EXECUTE } ]
    {
     [ @return_status = ]
     { module_name [ ;number ] | @module_name_var | (string_expression) }
        [ [ @parameter = ] { value
                         | @variable [ OUTPUT ]
                         | column
                         | ( expression )
                         | [ DEFAULT ]
        ]     [ ,...n ]
    FROM
    [ WHERE ]
    [ ORDER BY { order_by_expression | column_position [ ASC | DESC ] } ]
    }[;]

The execution procedure or command string will be executed once for every row returned from the query (may be zero) using the parameters or string value(s) calculated from each row's content.

If ORDER BY is specified then each EXECUTE instance will be executed sequentially in the order specified. If ORDER BY is not specified then the EXECUTE instances will be eligible to be executed in parallel (per thresholds, MAXDOP, available cores, etc). The command will be complete only once all instances have completed.

Despite the many possible multiple batches executed by this command, it is still considered a single command following the same rules of implicit/explicit transactions. Thus all instances will rollback or commit together. TRY..CATCH over this command will only be able to catch the first error. @return_status will only contain the value returned by the last completed instance.

------------------

 This provides a direct REDUCE-like capability for T-SQL. Although this ability can be approximated by WHILE loops or Dynamic SQL execution of aggregated strings, this would be significantly faster, more declarative than either, more set-oriented, more readable and maintainable. Additionally, this would be able to aggressively leverage parallelism, which no other simple or direct approach could do in SQL Server.

Benefits:

  • Faster Development
    Improved User Interface
    Improved Administration
    Improved Performance
    More readable; more set-oriented, more declarative.

 

Well, I submitted my sessions to PASS from my hospital bed Friday a week ago.  So I figured today, I would just go to PASS, copy my submissions and post them here, like Gail Shaw, Grant Fritchey, Jack Cobett and others have done.  Unfortunately, now that the call for speakers has been closed it seems that not only is the list of all submissions no longer avaialabe, I cannot even find my own submissions!   Anyone who knows how I can get to my submissions please let me know.  Yeah, I know I should have saved them off myself before hand, but I was larboring under gallstones, pancreatitis, an IV od Dilaudin and a hospital supplied wireless keyboard/trackpad wrapped in a teflon baggy for sanitation.  I was amazed I could get it done at all as any kind of cut and paste took me literally minutes.

So from memory, here are my submissions:

  • There Must Be 15 Ways to Lose Your Cursors.
  • The Top Ten Reasons You Aren't already using Service Broker
  •  Injected, Inspected, Detected, Infected, Neglected and Selected!

 

It has always bothered me that in this age of virtually all computer reference material being available online, dynamically and free, the ANSI/ISO committees still follow the 20th century "Brick and Mortar" practices of charging exorbiant fees for their standards documents and then has made itself financially dependent on this revenue.  The ANSI SQL:2008 standard is broken into the following part(with Wikipedia links):

The part 2: Foundation, the core of the standard costs about $450, even just to download the PDF.  Getting all of them would probably cost about $1500.  Now I ask you, how can they even pretend to aspire to be a unversal standard for SQL when the average professional would have to pay almost 2 weeks salary to have the right to read and reference it?...

Anyway, the SQL:2008 standard came out last summer, Sybase has a nice summary of the new features here.  Every couple of months I debate with myself whether or not to pay the $450 to get the Foundation PDF, but I just cannot justify it.  Finally it occured to me that I could acheive a kind of compromise by downloading the free PDFs of the draft of the standard from 2006, here.  Although only a draft, it is pretty close to that standard that was adopted (or so I here).  More importantly, it does incorporate everything that was already in the SQL:2003 standard. 

Si I have it now and will be slowly reading through it, making the occasional post about interesting things as I go.

For anyone who has been wondering where I've been, let's jsut say that it has been an eventful two weeks. I am writing this now from my hospistal bed, which it turns out is very difficult, so please forgive any errors.

I gave my SQL Injection speech two weeks ago (which I think that I did blog about). Then part one of my series "There must be fifteen ways to lose your cursors" was published over at SqlServerCentral.com to a firestorm of response (over 300 posts so far). Then I gave speech on the same topic to Philly.net code camp to very favoarable reviews (so far anyway).

Then last monday I had to go to the ER for intense stomach pains that were diagnosed a Gallstones. Tuesday I go to see a Surgeon about getting my Gall bladder removed. He schedules me for surgery on May 4 and puts me on an "Absolutely no Fat or Glycol" diet, meaning fruits, vegetables, water and skim milk only. Joy!

Thursday I have to leave work with stomach pains so bad I almost don't make it home. Then I have my younger son, Chris drive me to the ER again. They say that is is another Gallstone attack but as an added bonus I now have pancreaitis too.

They say they cannot remove the gall bladder until the pancreaitis goes away, but it is taking its own sweet time. So on monday they will do some surgical procedure to relieve the pancreaitis and assuming that goes OK remove my gall bladder on tuesday.

Of course part 2 of mu series comes out on Monday, so I guess that Jeff and Lynn and Gus and the others will have to field the questions.

Hopefully, I will be back soon ...

The downlod link in the prior update was pointing to the wrong presentation.  It has been corrected there and you can get it here as well.

I gave this presentation to PSSUG at SCP last night to very favorable reviews.  Nonetheless IMHO it is only 2/3rds done, so I hope to refine it quite a bit by the next time that I give it.  You can download the slides and code here.

 

Note:  Link was bad (old presentation), it has been fixed.

Halfway though the most overloaded two weeks I have had in a long time, ... I have finally finished installment #2 of my series "There Must Be 15 Ways to Lose Your Cursors" for SQLServerCentral and I will submit it to Steve soon (lat minut formatting). 

Next up is my presentation to PSSUG (Philadelphia Sql Server Users Group, a PASS chapther) the Wednesday evening entitled "Injected, Inspected, Detected, Infected, Neglected and Selected!  Using Dynamic SQL Safely, without SQL Injection".  (Hmm, I notice that my titles keep getting longer and longer, I wonder if that's a trend?) 

I'll be working on this brand new Injection speech over the next few days and then I have the PASS speaker proposals due by midnight Friday.  Never mind work and three Easter week services this week.  Whew!

Wow, what a time for SQL books.  First I received my pre-ordered SQL Server 2008 Query Performance Tuning Distilled by Grant Fritchey last week, about a week earlier than expected.  Grant is a friend from SQLServerCentral and PASS 2008 and a great guy to boot.  Follow the link to his blog where you will find more links to order his book online.

Then yesterday, I received my pre-ordered SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Connor Cunningham, Adam Machanic, and Ben Nevarez two weeks ahead of schedule.  The incredible line up of writers on this book has made it the most hotly anticipated new SQL book in many years.

The irony is that becuase of the demands of my current project and because I am already behind in sumbitting my PASS 2009 proposals, I probable won't be able to get to them for a while yet.    Oh well.  Just thumbing through, both look great though ...

Heh.  Well I forgot to post the best version of the XML aggregator when I posted the other day.  Then Ward Pond tagged in a post on it (here), and pointed out that I was "improving" on an old version that had been fixed by Adam Machanic among others...

Here is my "good" version that fixes the entitization problem:

 

SELECT (

SELECT n + ','
FROM (
  SELECT 'a<b' AS n
UNION ALL
  SELECT 'b>a'
UNION ALL
  SELECT 'b&a'
UNION ALL
  SELECT 'b

a') r

FOR XML PATH(''), TYPE

).value('.[1]','varchar(max)')
-- =====

 And here is the version currenlty being used by Ward and company...:

 

;WITH ColumnToPivot ([data()]) AS (
    SELECT p.ParentString + N', '
    FROM Parent p
    JOIN Child c
    ON c.ParentId = p.ParentId
    WHERE c.ChildId = 2
    ORDER BY p.ParentId
    FOR XML PATH(''), TYPE
),
    XmlRawData (CSVString) AS (
        SELECT (SELECT [data()] AS mydata FROM ColumnToPivot AS d FOR XML RAW, TYPE).value( '/row[1]/mydata[1]', 'NVARCHAR(max)') AS CSV_Column
)
SELECT
LEFT(CSVString, LEN(CSVString)-1) AS CSVList
FROM XmlRawData

 

Finally, I heard yesterday that Adam is has a new challenge, involving, you guessed it, grouping concatenated strings.  Guess I better get busy...

 

Just a quick note ...

Upon further testing and analysis, I feel confident that the best solution for string concatenation in SQL Server 2005+ is the FOR XML method, like so:

SELECT CAST(
   (SELECT TABLE_NAME+', '
     From INFORMATION_SCHEMA.TABLES
     Order By TABLE_NAME
     FOR XML PATH('')
) as VARCHAR(MAX))

It has a lot of advantages, which I will go into later.  It also has one big problem(Entitization) which I will also demonstrate how to fix...

Last night I premiered my presentation "There Must Be 15 Ways To Lose Your Cursors" to the South Jersey division of the Philadelphia SQL Server Users Group.

As promised, I have uploaded the presentation and all of the SQL Scripts as a ZIP it to this site.  You can download it here.

Just a quick note on string concatenation:

 

  Concatenation(appending) a table of many small strings together into one big string is easy in SQL Server, but slow.  Fixng it is hard for the reasons that I list here.  I beleive that I have "solved" this, reducing the O(n2) operation to effectively O(n*Log(n)).

A preliminary version of this technique is demonstrated here.  The approach seen in these forum posts can be greatly improved by "stacking" the 2k roll-ups concatenating all of the lower level strings together at once, instead of "cacading" them up one level at a time, as demonstrated.

I will post a more complete version later...

A fast and accurate way to calculate the distance in miles between two points, based on the latitudes and longitudes.

Read More »

There are basically three (actually four) strategies for protecting against SQL Injection while allowing some use of Dynamic SQL:

  1.  BlackListing
  2.  Firewalling
  3.  Parametizing
  4.  Parsing*

"Blacklisting" is the practice of scanning the client-supplied text for certain "blacklisted" strings, such as semicolons or words like "EXEC" and "DROP", as a way of detecting injection attacks.  Blacklisting is a bad practice and fails for two simple reasons, first, it is not possible to detect every possible injection attack without actually fully parsing the final Dynamic SQL execution string.  No matter how many banned strings you included in your scanner, you still cannot be sure that you have accounted for every posibility.  For instance, consider the alternate quotation characters in the upper half of the ASCII character set (and many more in Unicode).  Do you know which ones of them can be used to close a quotation?  Neither do I, and neither does anybody that I know.

Secondly, blacklisting invariably breaks the application that it is trying to protect by preventing perfectly valid queries from being executed.  For instance, a query for an employee named "Sally Waldrop" is unlikley to make it through the blacklist.  A report query on a client called "Executive Cleaning" is also likely to be rejected.  And the more that you try to fix the first problem by adding more and more strings to the blacklist, the worse you will be making the second problem.  So blacklisting is a bad choice both because you cannot be sure that it will work and because you can be sure that it will break your application.

"Firewalling" is the practice of using security settings to restrict the Dynamic SQL's execution context permissions down to only what that Dynamic SQL statement is supposed to do.  Thus, for a procedure that is supposed to execute a dynamic SQL string to produce a report on the CLIENTS table, a special database User called "CLIENTS_Reports" would be created that had SELECT permission granted to the CLIENTS table, Like this:

    Grant Select on OBJECT::dbo.CLIENTS to CLIENTS_Reporter

 but no rights to anything else, nor any other rights on the CLIENTS table.  Then the procedure would execute the dynamic SQL like this:

    Execute(@query) As USER='CLIENTS_Reporter'


The weaknesses of firewalling are first, that it is a point solution only, as a separate database User needs to be created for each separate stored procedure that uses client text in Dynamic SQL.  Using a single User for multiple stored procedures can comprise the integrity of this strategy because you may have to combine different permission sets to be sure that this one user can do everything that it might be legitimatelly called on to do.  Seperate User accounts for each one is fine if you only have a few of these, but quickly becomes unmanageble with larger numbers.  More discussion on this appraoch by me can be found here.

Secondly, it can be difficult with just security settings and permissions to get the fine control that you may need.  For instance, an application that supports bank tellers may use dynamic SQL to access account information.  While this could be managed with firewalling, you would probably also want to insure that the query could only return information for a single account at a time as any query from a teller that lists information for ALL accounts is likely to be something illegitmate.  In this case, firewalling cannot do much for you.

"Parametizing" is the practice of only allowing client-supplied text to be passed through parameters and variables and never actualized as executable code.  Thus, instead of code like this:

    Select @query = 'SELECT Column FROM table WHERE key=''' + @UserInput + ''''
Execute(@query)


The code would be rewritten like this:

    Select @query = 'SELECT Column FROM table WHERE key=@p1'
Set @P1Def = N'@P1 NVarchar(255)';
EXEC sp_executesql @query, @P1Def, @UserQuery;


In the first case, a client string of " '; DELETE FROM CLIENTS; --" could spell disaster, however, in the parametized instance, it would just return no records.

The shortcomings of parametization are first, it requires more work and thought than other approaches.  Secondly, it initially seems to be applicable to only the simplest cases of dynamic SQL (as above).  However, it turns out that there are some tricky ways to use parametization that allow it to be applied safely and sucessfully to all but the most complex cases of dynamic SQL.  These methods (variable execution, reconstitution, and directed expression) are beyond the scope of this article, however, it should be noted that they have some disadvantages of their own: primarily that they typically require a stricter and more complex interface between the client and the server procedure and thus cannot usually be implemented with making changes in the client as well.

The fourth stategy, "Parsing" is really only included for completeness as it is not actually implmented in any case that I know of.  If it were used, parsing would invovle completely parsing the SQL syntax of dynamic text to be executed and then insuring that it conforms to restrictions defined for that particular use of dynamic SQL (for instance, that it consist of only a single SQL statement).  Parsing is not used primarily because it would be just too hard to implement on SQL Server.  SQL is not an easy language to parse completely and correctly and SQL is not a language that is well suited the task of syntactically and semantically parsing a string of text.  Even setting aside the difficulty of it, the overhead of doing it repeatedly for a heavily used procedure could be severe.  Consequently, without some substantial help in the future from SQL Server itself (or some other Microsoft facility), this approach will probably always remain impractical.

 

Copyright 2008 by R. Barry Young
 RBarryYoung.net  |  Terms Of Use  |  Privacy Statement