Thursday, 1 December 2011

Cool CROSS APPLY Tricks, Part 2

In Part 1, we looked at some fairly typical examples of using CROSS APPLY.

But the best is yet to come.

Consider this problem that Adam Haines brought up in his blog entry entitled Converting a Delimited String of Values into Columns. He presents the following table.
declare @t table(  ProductId int ,ProductName varchar(25) ,SupplierId int ,Descr varchar(50)
) insert @t select 1,'Product1',1,'A1,10in,30in,2lbs'union all select 2,'Product2',2,'T6,15in,30in'union all select 3,'Product3',1,'A2,1in,,0.5lbs'union all select 4,'Product4',1,'X5,3in'
Note the column called Descr. That column contains 4 attributes of a product (Type, Length, Height, and Width) in a comma-delimited format. Your job: extract those values. Adam uses a very nice XML method to extract the values from the Descr field. Unfortunately the problem with using XML is that one of the attributes might contain a character that would break it, like an ampersand (&) or a less-than sign (<) for example.

So CROSS APPLY to the rescue!

The Descr field may not have all the commas in place (see row 4 in the table), so we will artifically add 4 of them to the end, creating a string to work with:

select ProductID
      ,SupplierID
      ,stringfrom @tcross apply (select string=Descr+',,,,') f1/*
  ProductID  SupplierID string
----------- ----------- ---------------------
          1           1 A1,10in,30in,2lbs,,,,
          2           2 T6,15in,30in,,,,
          3           1 A2,1in,,0.5lbs,,,,
          4           1 X5,3in,,,,
*/
You see how I used CROSS APPLY to introduce a new column called string? Now that we have that string to work with, we need to find where the commas are. The first comma is easy. We just use CHARINDEX. But the second and third and fourth commas all depend on the location of the previous comma. So we must use CROSS APPLY 4 times:

select ProductID
      ,SupplierID
      ,string
      ,p1,p2,p3,p4from @tcross apply (select string=Descr+',,,,') f1cross apply (select p1=charindex(',',string)) f2cross apply (select p2=charindex(',',string,p1+1)) f3cross apply (select p3=charindex(',',string,p2+1)) f4cross apply (select p4=charindex(',',string,p3+1)) f5/*
  ProductID  SupplierID string                p1 p2 p3 p4
----------- ----------- --------------------- -- -- -- --
          1           1 A1,10in,30in,2lbs,,,,  3  8 13 18
          2           2 T6,15in,30in,,,,       3  8 13 14
          3           1 A2,1in,,0.5lbs,,,,     3  7  8 15
          4           1 X5,3in,,,,             3  7  8  9
*/
Now that we have all the locations of the commas, it’s easy to pull out the attributes that we want:

select ProductID
      ,SupplierID
      ,[Type]
      ,Length
      ,Height
      ,Weightfrom @tcross apply (select string=Descr+',,,,') f1cross apply (select p1=charindex(',',string)) f2cross apply (select p2=charindex(',',string,p1+1)) f3cross apply (select p3=charindex(',',string,p2+1)) f4cross apply (select p4=charindex(',',string,p3+1)) f5cross apply (select [Type]=substring(string,1,p1-1)                   ,Length=substring(string,p1+1,p2-p1-1)                   ,Height=substring(string,p2+1,p3-p2-1)                   ,Weight=substring(string,p3+1,p4-p3-1)) f6/*
  ProductID  SupplierID Type Length Height Weight
----------- ----------- ---- ------ ------ ------
          1           1 A1   10in   30in   2lbs
          2           2 T6   15in   30in   
          3           1 A2   1in           0.5lbs
          4           1 X5   3in     
*/
The terrific thing about this is that it doesn’t involve anything in terms of cost. If you look at a query plan, all it consists of is a Table Scan followed by a Compute Scalar, the latter of which has an estimated operator cost of 0.0000001… in other words, pretty much no cost at all! The Query Optimizer combined all the CROSS APPLY stuff I introduced and essentially does the following query, (which you can see if you look at the properties of the Compute Scalar icon in the plan):

select ProductID
      ,SupplierID
      ,[Type]=substring([Descr]+',,,,',1,charindex(',',[Descr]+',,,,')-1)      ,Length=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,')+1,               (charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)-               charindex(',',[Descr]+',,,,'))-1)      ,Height=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,',               charindex(',',[Descr]+',,,,')+1)+1,(charindex(',',[Descr]+',,,,',               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)-               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1))-1)      ,Weight=substring([Descr]+',,,,',charindex(',',[Descr]+',,,,',               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)+1,               (charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,',               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,')+1)+1)+1)-               charindex(',',[Descr]+',,,,',charindex(',',[Descr]+',,,,',               charindex(',',[Descr]+',,,,')+1)+1))-1)from @t 
How would you like to parse through and troubleshoot that? What a nightmare! But you don’t have to… our CROSS APPLY query is very readable and easy to understand, and it's fast.

Here’s one more example…

Here’s a table of messages:

declare @t table (Message varchar(100))insert @t select 'cross apply is really cool'union all select 'and ntile is really cool too'union all select 'and iced tea is really cool too'
Your job is to pull out all the word pairs and report on how often they occur throughout the table. Here’s how to do it with our new pal CROSS APPLY:

;with Numbers as(  select N=Number 
  from master..spt_values
  where Type='P' and Number>0)select WordPair,Occurrences=count(*)from @tjoin Numbers on substring(' '+Message,N,1)=' ' and N<len(Message)+1cross apply (select string=substring(' '+Message+' ',N+1,len(Message)+1)) f1cross apply (select p1=charindex(' ',string)) f2cross apply (select p2=charindex(' ',string,p1+1)) f3cross apply (select WordPair=case when p1<p2 then left(string,p2-1) end) f4where WordPair is not nullgroup by WordPairorder by count(*) desc/*
WordPair    Occurrences
----------- -----------
is really             3
really cool           3
cool too              2
cross apply           1
iced tea              1
and iced              1
and ntile             1
apply is              1
tea is                1
ntile is              1
*/
I use a table of numbers (I “cheated” and just used the numbers in master..spt_values) and JOIN that to the Message column based on the location of its space characters. The first CROSS APPLY (f1) creates a string to work with, which is a SUBSTRING of the Message, starting at a word boundary (and ending with a space I artificially put at the end). The second and third CROSS APPLYs (f2 and f3) find the location of the first and second space characters in that string. The final CROSS APPLY creates the word pair, if one was found. If no word pair is found, then WordPair is set to NULL (because there is no ELSE in the CASE). That is filtered out by the WHERE clause.

(Edit on July11,2009: For a more in-depth discussion of processing the above word-pair query, see my blog post entitled Word Pairs Revisited (Cool CROSS APPLY Tricks, Part 2 ½)).

Note that all these new columns that we introduced (string, p1, p2, WordPair) can be used in a WHERE clause and a GROUP BY clause or pretty much anywhere within the query. That’s the best part.

The query below is what is really calculated behind the scenes:

;with Numbers as(  select N=Number 
  from master..spt_values
  where Type='P' and Number>0)select WordPair=case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
                then left(substring(' '+Message+' ',N+1,len(Message)+1),                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),                  charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
                end      ,count(*)from @tjoin Numbers on substring(' '+Message,N,1)=' ' and N<len(Message)+1where case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
      then left(substring(' '+Message+' ',N+1,len(Message)+1),        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),        charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
      end is not nullgroup by case when charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))<           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1) 
         then left(substring(' '+Message+' ',N+1,len(Message)+1),           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1),           charindex(' ',substring(' '+Message+' ',N+1,len(Message)+1))+1)-1) 
         end
order by count(*) desc
Yeesh!

So after all this, I hope I have converted you to the beauty of the APPLY operator. Go spread the word!

No comments:

Post a Comment