Monday, 28 November 2011

Cool CROSS APPLY Tricks, Part 1

In most SQL Server books or tutorials, the APPLY operator would get no more than a 1 or 2 page treatment, if at all. (A notable exception is Itzik Ben-Gan’s book Inside Microsoft SQL Server 2005: T-SQL Querying, which goes into some depth). But in my opinion it’s a really cool and extremely useful operator.

The example below is representative of about 97% of the examples you see in books and tutorials regarding APPLY in that it uses a table-valued function. Using the AdventureWorks database, it finds all Vista credit cards that expire in June of 2008, and gives information about the contact associated with the credit card.
select f.FirstName
      ,f.LastName
      ,f.JobTitle
      ,f.ContactType
      ,cc.CardNumberfrom Sales.CreditCard ccjoin Sales.ContactCreditCard ccc on cc.CreditCardID=ccc.CreditCardIDcross apply dbo.ufnGetContactInformation(ccc.ContactID) fwhere cc.ExpYear=2008
  and cc.ExpMonth=6
  and cc.CardType='Vista'/*
FirstName LastName JobTitle         ContactType   CardNumber
--------- -------- ---------------- ------------- --------------
Peggy     Justice  Owner            Store Contact 11119759315644
John      McClane  Purchasing Agent Store Contact 11119490672347
Laura     Cai      NULL             Consumer      11112813884091
Natalie   Gonzales NULL             Consumer      11114369564985
Jarrod    Sara     NULL             Consumer      11116045498593
Katherine Bailey   NULL             Consumer      11119100149656
Stephanie Gray     NULL             Consumer      11112324154556
Shawna    Sharma   NULL             Consumer      11116413893110
Mindy     Rai      NULL             Consumer      11115163407997
Jackson   Jai      NULL             Consumer      11112011871602
And so on... (74 rows total)
*/
The SQL Engine first must evaluate everthing on the left-hand side of the APPLY, and then, for each row, it is able to process (or apply) the right-hand side to produce some kind of a (table-valued) result.

So in the example above, it first evaluates the left-hand side (i.e. JOINs the CreditCard and ContactCreditCard tables and filters through only the Vista credit cards that expire in June2008), and then for each row, it will apply the right-hand side (i.e. call the function,passing it the ContactID) to produce a result (the function returns a 1-row table consisting of FirstName, LastName, JobTitle, and ContactType).

That’s kind of cool, and it was something that you were not able to do prior to SQL2005, when the APPLY operator was introduced.

But the right-hand side doesn’t have to be a function.

Let’s take a look at all the Stores in AdventureWorks whose Main Office is in Wisconsin (we have to trudge through a lot of tables to get that information):

select c.CustomerID
      ,s.Name
from Sales.Customer cjoin Sales.Store s on c.CustomerID=s.CustomerIDjoin Sales.CustomerAddress ca on c.CustomerID=ca.CustomerIDjoin Person.Address a on ca.AddressID=a.AddressIDjoin Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceIDwhere ca.AddressTypeID=3 --MainOffice  and sp.StateProvinceCode='WI'/*
CustomerID Name
----------- ---------------------------
        418 Good Bike Shop
        453 Unique Bikes
        543 Friendly Neighborhood Bikes
        606 Little Bicycle Supply Shop
*/
And for each of those customers, let’s find out the top 3 products that they bought in terms of dollars (and while we’re at it, let’s show the dollars they spent on the product):

select c.CustomerID
      ,s.Name      ,f1.ProductID
      ,f1.PurchaseAmtfrom Sales.Customer cjoin Sales.Store s on c.CustomerID=s.CustomerIDjoin Sales.CustomerAddress ca on c.CustomerID=ca.CustomerIDjoin Person.Address a on ca.AddressID=a.AddressIDjoin Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceIDcross apply (select top (3) 
                    ProductID
                   ,PurchaseAmt=sum(LineTotal)             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod
               on soh.SalesOrderID=sod.SalesOrderID
             where CustomerID=c.CustomerID
             group by ProductID
             order by sum(LineTotal) desc) f1where ca.AddressTypeID=3 --MainOffice  and sp.StateProvinceCode='WI'/*
 CustomerID Name                        ProductID PurchaseAmt
----------- --------------------------- --------- -----------
        418 Good Bike Shop                    795  30367.3500
        418 Good Bike Shop                    794  24136.8075
        418 Good Bike Shop                    792  23508.5175
        453 Unique Bikes                      773  16319.9520
        453 Unique Bikes                      771  12239.9640
        453 Unique Bikes                      772  12239.9640
        543 Friendly Neighborhood Bikes       782   9638.9580
        543 Friendly Neighborhood Bikes       868    671.9040
        543 Friendly Neighborhood Bikes       869    335.9520
        606 Little Bicycle Supply Shop        717   1717.8000
        606 Little Bicycle Supply Shop        838    858.9000
        606 Little Bicycle Supply Shop        738    809.3280
*/
Notice that the right-hand side of the CROSS APPLY is a correlated query where we refer to c.CustomerID, a column from the left-hand side. So each of the 4 customers from the left-hand side were applied to our correlated query on the right-hand side to produce 3 rows each.

But let’s not stop there… We can keep chaining additional information. For each of those products we found, let’s find out the top customer for that product (i.e. who placed the most orders on the product?):

select c.CustomerID
      ,s.Name      ,f1.ProductID
      ,f1.PurchaseAmt
      ,f2.TopCustomerID
      ,f2.NumOrdsfrom Sales.Customer cjoin Sales.Store s on c.CustomerID=s.CustomerIDjoin Sales.CustomerAddress ca on c.CustomerID=ca.CustomerIDjoin Person.Address a on ca.AddressID=a.AddressIDjoin Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceIDcross apply (select top (3) 
                    ProductID
                   ,PurchaseAmt=sum(LineTotal)             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod
               on soh.SalesOrderID=sod.SalesOrderID
             where CustomerID=c.CustomerID
             group by ProductID
             order by sum(LineTotal) desc) f1cross apply (select top (1)                    TopCustomerID=CustomerID
                   ,NumOrds=count(*)             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod
               on soh.SalesOrderID=sod.SalesOrderID
             where ProductID=f1.ProductID
             group by CustomerID
             order by count(*) desc) f2where ca.AddressTypeID=3 --MainOffice  and sp.StateProvinceCode='WI'/*
 CustomerID Name                        ProductID PurchaseAmt TopCustomerID NumOrds
----------- --------------------------- --------- ----------- ------------- -------
        418 Good Bike Shop                    795  30367.3500             4       8
        418 Good Bike Shop                    794  24136.8075             4       8
        418 Good Bike Shop                    792  23508.5175            16       8
        453 Unique Bikes                      773  16319.9520            75       4
        453 Unique Bikes                      771  12239.9640            20       4
        453 Unique Bikes                      772  12239.9640           118       4
        543 Friendly Neighborhood Bikes       782   9638.9580            10       8
        543 Friendly Neighborhood Bikes       868    671.9040            23       4
        543 Friendly Neighborhood Bikes       869    335.9520            10       4
        606 Little Bicycle Supply Shop        717   1717.8000           166       8
        606 Little Bicycle Supply Shop        838    858.9000            24       6
        606 Little Bicycle Supply Shop        738    809.3280           166      12
*/
This second CROSS APPLY (f2) took the f1.ProductID that resulted from the first CROSS APPLY (f1), and it applied it to the correlated query on its right-hand side to produce a single row of information, namely the TopCustomerID and NumOrds columns.

Note that if we wanted to add product descriptions to this query, which we would get from the Production.Product table, we must perform the JOIN to that table AFTER the CROSS APPLY, because everything with APPLY is processed left-to-right.

select c.CustomerID
      ,s.Name      ,f1.ProductID
      ,ProductName=p.Name      ,f1.PurchaseAmt
      ,f2.TopCustomerID
      ,f2.NumOrdsfrom Sales.Customer cjoin Sales.Store s on c.CustomerID=s.CustomerIDjoin Sales.CustomerAddress ca on c.CustomerID=ca.CustomerIDjoin Person.Address a on ca.AddressID=a.AddressIDjoin Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceIDcross apply (select top (3) 
                    ProductID
                   ,PurchaseAmt=sum(LineTotal)             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod
               on soh.SalesOrderID=sod.SalesOrderID
             where CustomerID=c.CustomerID
             group by ProductID
             order by sum(LineTotal) desc) f1cross apply (select top (1)                    TopCustomerID=CustomerID
                   ,NumOrds=count(*)             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod
               on soh.SalesOrderID=sod.SalesOrderID
             where ProductID=f1.ProductID
             group by CustomerID
             order by count(*) desc) f2join Production.Product p on f1.ProductID=p.ProductIDwhere ca.AddressTypeID=3 --MainOffice  and sp.StateProvinceCode='WI'/*
 CustomerID Name                        ProductID ProductName                PurchaseAmt TopCustomerID NumOrds
----------- --------------------------- --------- -------------------------- ----------- ------------- -------
        606 Little Bicycle Supply Shop        717 HL Road Frame - Red, 62      1717.8000           166       8
        606 Little Bicycle Supply Shop        738 LL Road Frame - Black, 52     809.3280           166      12
        453 Unique Bikes                      771 Mountain-100 Silver, 38     12239.9640            20       4
        453 Unique Bikes                      772 Mountain-100 Silver, 42     12239.9640           118       4
        453 Unique Bikes                      773 Mountain-100 Silver, 44     16319.9520            75       4
        543 Friendly Neighborhood Bikes       782 Mountain-200 Black, 38       9638.9580            10       8
        418 Good Bike Shop                    792 Road-250 Red, 58            23508.5175            16       8
        418 Good Bike Shop                    794 Road-250 Black, 48          24136.8075             4       8
        418 Good Bike Shop                    795 Road-250 Black, 52          30367.3500             4       8
        606 Little Bicycle Supply Shop        838 HL Road Frame - Black, 44     858.9000            24       6
        543 Friendly Neighborhood Bikes       868 Women's Mountain Shorts, M    671.9040            23       4
        543 Friendly Neighborhood Bikes       869 Women's Mountain Shorts, L    335.9520            10       4
*/
If we had tried to place that JOIN with all the other JOINs (i.e. BEFORE the first CROSS APPLY), we would get an error: The multi-part identifier “f1.ProductID” could not be bound.

This is all interesting, but the real magic of CROSS APPLY is yet to come in Part 2.

No comments:

Post a Comment