Friday, 18 November 2011

You May Be Interested In The Following Products…

In my last blog entry, we came up with an inline table-valued function in AdventureWorks (AW) that accepts a ProductID and CustomerID and will return 3 “suggested” ProductID’s. The idea was that a Customer would go to AW’s website and look at a Product, and the 3 suggested products would be presented under the title “Customers who bought this item also bought…”

We took a random customer from the AW database named Zachary Hughes (CustomerID 12203) in order to build and test our function. We’ll continue to use our pal Zack in this blog entry also.

We left off with looking at the products suggested to Zack when he’s looking at ProductID 777:
select * from dbo.ufn_GetSuggProducts(777,12203)
/*
 SuggProdID
-----------
        872
        934
        878
*/
So it occurred to me that Zack has bought several products in the past. Just because I’m a data nerd, I was wondering what kind of suggested products come up if we were to “feed” those previously-purchased products to our function for Zack?

First we have to find out what Zack has purchased. Let’s use a Common Table Expression (CTE) called CustProds to get all the distinct products that Zack has bought, and we’ll JOIN in the Product table so we can see the names of those products.

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)select CustProdID=cp.ProductID
      ,CustProdName=p1.Name
from CustProds cpjoin Production.Product p1 on cp.ProductID=p1.ProductID/*
 CustProdID CustProdName
----------- --------------------------
        711 Sport-100 Helmet, Blue
        712 AWC Logo Cap
        713 Long-Sleeve Logo Jersey, S
        870 Water Bottle - 30 oz.
        871 Mountain Bottle Cage
        873 Patch Kit/8 Patches
        874 Racing Socks, M
        921 Mountain Tire Tube
        929 ML Mountain Tire
        930 HL Mountain Tire
*/
Now for each of those products, we’ll “feed” that product into our function via a CROSS APPLY operator. And we’ll JOIN in the Product table once again so we can also see the names of those suggested products:

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)select CustProdID=cp.ProductID
      ,CustProdName=p1.Name      ,SuggProdID
      ,SuggProdName=p2.Name
from CustProds cpjoin Production.Product p1 on cp.ProductID=p1.ProductIDcross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) spjoin Production.Product p2 on sp.SuggProdID=p2.ProductID/*
 CustProdID CustProdName                SuggProdID SuggProdName
----------- -------------------------- ----------- -----------------------
        711 Sport-100 Helmet, Blue             922 Road Tire Tube
        711 Sport-100 Helmet, Blue             923 Touring Tire Tube
        711 Sport-100 Helmet, Blue             878 Fender Set - Mountain
        712 AWC Logo Cap                       707 Sport-100 Helmet, Red
        712 AWC Logo Cap                       922 Road Tire Tube
        712 AWC Logo Cap                       878 Fender Set - Mountain
        713 Long-Sleeve Logo Jersey, S         922 Road Tire Tube
        713 Long-Sleeve Logo Jersey, S         878 Fender Set - Mountain
        713 Long-Sleeve Logo Jersey, S         707 Sport-100 Helmet, Red
        870 Water Bottle - 30 oz.              707 Sport-100 Helmet, Red
        870 Water Bottle - 30 oz.              708 Sport-100 Helmet, Black
        870 Water Bottle - 30 oz.              878 Fender Set - Mountain
        871 Mountain Bottle Cage               878 Fender Set - Mountain
        871 Mountain Bottle Cage               707 Sport-100 Helmet, Red
        871 Mountain Bottle Cage               708 Sport-100 Helmet, Black
        873 Patch Kit/8 Patches                877 Bike Wash - Dissolver
        873 Patch Kit/8 Patches                878 Fender Set - Mountain
        873 Patch Kit/8 Patches                707 Sport-100 Helmet, Red
        874 Racing Socks, M                    878 Fender Set - Mountain
        874 Racing Socks, M                    922 Road Tire Tube
        874 Racing Socks, M                    707 Sport-100 Helmet, Red
        921 Mountain Tire Tube                 708 Sport-100 Helmet, Black
        921 Mountain Tire Tube                 707 Sport-100 Helmet, Red
        921 Mountain Tire Tube                 878 Fender Set - Mountain
        929 ML Mountain Tire                   708 Sport-100 Helmet, Black
        929 ML Mountain Tire                   707 Sport-100 Helmet, Red
        929 ML Mountain Tire                   878 Fender Set - Mountain
        930 HL Mountain Tire                   708 Sport-100 Helmet, Black
        930 HL Mountain Tire                   707 Sport-100 Helmet, Red
        930 HL Mountain Tire                   878 Fender Set - Mountain
*/
Each of Zack’s original products produced its own list of 3 suggested products. So, for example, looking at the Racing Socks (ProductID 874) that Zack had purchased, it looks like Customers who bought that item also bought a Mountain Fender Set and/or a Road Tire Tube, and/or a Red Sport Helmet. (Remember that the suggested products that our function produces are items that Zack himself has never purchased before).

Note that there are many repeated suggestions here. That makes me think of something. If we take a Customer’s purchasing history, we can come up with a list of (distinct) suggested products based on that and promote them to the customer when he logs into the AW website under a heading “You may be interested in the following products…” We’ve got to let the customer know that he’s sort of “missing out” on all this wonderful stuff that other people have bought but he hasn’t.

So let’s come up with yet another table function that will accept a CustomerID and will come up with some items to promote to the customer based on his purchasing history.

Continuing on with Zack as our guinea pig test case, let’s GROUP BY the suggested products we produced in our last query and see how many times they occur, showing the most-occurring products at the top. Note that I created a second CTE (called SuggProds) that produces our Suggested Products and we do the GROUP BY query off of that CTE:

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)
,SuggProds as(  select SuggProdID
  from CustProds cp
  cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp)select SuggProdID
      ,SuggProdName=p.Name      ,NumOccurs=count(*)from SuggProds spjoin Production.Product p on sp.SuggProdID=p.ProductIDgroup by SuggProdID
        ,p.Name
order by count(*) desc/*
 SuggProdID SuggProdName              NumOccurs
----------- ----------------------- -----------
        878 Fender Set - Mountain            10
        707 Sport-100 Helmet, Red             9
        708 Sport-100 Helmet, Black           5
        922 Road Tire Tube                    4
        923 Touring Tire Tube                 1
        877 Bike Wash - Dissolver             1
*/
Looks like that Mountain Fender Set is something that Zack really needs, huh?

But notice that the list came up with 2 different Helmets… one red and one black. There are also two kinds of Tire Tubes. We don’t really want our list of promo products to be repetitive like that. In general, I’m guessing that we don’t really want to repeat items of the same Product SubCategory.

Let’s add the Product SubCategory to our query so we can analyze this further:

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)
,SuggProds as(  select SuggProdID
  from CustProds cp
  cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp)select SuggProdID
      ,SuggProdName=p.Name      ,SuggSubCatg=p.ProductSubCategoryID
      ,NumOccurs=count(*)from SuggProds spjoin Production.Product p on sp.SuggProdID=p.ProductIDgroup by SuggProdID
        ,p.Name        ,p.ProductSubCategoryIDorder by count(*) desc/*
 SuggProdID SuggProdName            SuggSubCatg   NumOccurs
----------- ----------------------- ----------- -----------
        878 Fender Set - Mountain            30          10
        707 Sport-100 Helmet, Red            31           9
        708 Sport-100 Helmet, Black          31           5
        922 Road Tire Tube                   37           4
        923 Touring Tire Tube                37           1
        877 Bike Wash - Dissolver            29           1
*/
Yep, just as we thought… the two helmets are from SubCategory #31 and the Tire Tubes are from SubCategory #37. I think we should pull out the top product (based on the highest number of occurrences) from each SubCategory.

We can easily do that using the ROW_NUMBER() function. Let’s just see what happens when we add that window function to our query, doing a PARTITION BY the SubCategoryID and ORDERing BY the Number of Occurrences (which is just COUNT(*)) in descending order:

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)
,SuggProds as(  select SuggProdID
  from CustProds cp
  cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp)select SuggProdID
      ,SuggProdName=p.Name      ,SuggSubCatg=p.ProductSubCategoryID
      ,NumOccurs=count(*)      ,RowNum=row_number() over (partition by p.ProductSubCategoryID
                                 order by count(*) desc)from SuggProds spjoin Production.Product p on sp.SuggProdID=p.ProductIDgroup by SuggProdID
        ,p.Name        ,p.ProductSubCategoryIDorder by count(*) desc/*
 SuggProdID SuggProdName            SuggSubCatg   NumOccurs RowNum
----------- ----------------------- ----------- ----------- ------
        878 Fender Set - Mountain            30          10      1
        707 Sport-100 Helmet, Red            31           9      1
        708 Sport-100 Helmet, Black          31           5      2
        922 Road Tire Tube                   37           4      1
        923 Touring Tire Tube                37           1      2
        877 Bike Wash - Dissolver            29           1      1
*/
All the items that got assigned RowNum=1 are the ones we should present as our Promo Products. So let’s put our main query into its own CTE (which we’ll call SuggSummary), and we’ll pull out all rows with RowNum=1 from that:

declare @CustomerID int
set @CustomerID=12203  --Our pal Zack!;with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)
,SuggProds as(  select SuggProdID
  from CustProds cp
  cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp)
,SuggSummary as(  select SuggProdID
        ,SuggProdName=p.Name        ,SuggSubCatg=p.ProductSubCategoryID
        ,NumOccurs=count(*)        ,RowNum=row_number() over (partition by p.ProductSubCategoryID
                                   order by count(*) desc)  from SuggProds sp
  join Production.Product p on sp.SuggProdID=p.ProductID
  group by SuggProdID
          ,p.Name          ,p.ProductSubCategoryID)select SuggProdID
      ,SuggProdName=SuggProdName
      ,SuggSubCatgfrom SuggSummarywhere RowNum=1order by NumOccurs desc/*
 SuggProdID SuggProdName            SuggSubCatg
----------- ----------------------- -----------
        878 Fender Set - Mountain            30
        707 Sport-100 Helmet, Red            31
        922 Road Tire Tube                   37
        877 Bike Wash - Dissolver            29
*/
Looks good! We’ve got all of our Promo Products from different SubCategories, so there’s a good variety of items here.

I think we can go ahead and create our function now. Just so we don’t overwhelm the customer once he signs on to the AW website, we’ll limit our Promo Products to 5 items. So that means adding a TOP (5) to our main query. Since our function is only going to return a single column of the Promo ProductID’s, we don’t need the Product.Name column in our SuggSummary CTE anymore… it was just there so we could see the descriptions as we were building our solution. Here’s the final function:

if object_id('ufn_GetPromoProducts') is not null drop function dbo.ufn_GetPromoProducts
gocreate function dbo.ufn_GetPromoProducts(  @CustomerID int)returns table
as
return
with CustProds as(  select distinct ProductID
  from Sales.SalesOrderHeader soh
  join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
  where CustomerID=@CustomerID)
,SuggProds as(  select SuggProdID
  from CustProds cp
  cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp)
,SuggSummary as(  select SuggProdID
        ,SuggSubCatg=p.ProductSubCategoryID
        ,NumOccurs=count(*)        ,RowNum=row_number() over (partition by p.ProductSubCategoryID
                                   order by count(*) desc)  from SuggProds sp
  join Production.Product p on sp.SuggProdID=p.ProductID
  group by SuggProdID
          ,p.ProductSubCategoryID)select top (5) PromoProdID=SuggProdIDfrom SuggSummarywhere RowNum=1order by NumOccurs desc
So now let’s test out the function with Zack’s CustomerID:

select * from dbo.ufn_GetPromoProducts(12203)/*
PromoProdID
-----------
        878
        707
        922
        877
*/
Yep, those are the ones. And what if we take another customer… any customer… how about Elizabeth Johnson (Customer 11004)? What would we promote to her when she logs on to the AW website?

select * from dbo.ufn_GetPromoProducts(11004)/*
PromoProdID
-----------
        870
        921
        711
*/
So we built a pretty sophisticated and complicated function here in a step-by-step manner, building from the ground up. Our new function involves multiple CTEs and multiple JOINs and a CROSS APPLY function call to yet another function that employs many JOINs and WHERE predicates.

If we were to expand everything we did into one giant query (converting the CTEs to derived tables), this is what it looks like:

select top (5) PromoProdID=SuggProdIDfrom(  select SuggProdID
        ,SuggSubCatg=p.ProductSubCategoryID
        ,NumOccurs=count(*)        ,RowNum=row_number() 
                over (partition by p.ProductSubCategoryID
                      order by count(*) desc)  from 
  (    select SuggProdID
    from 
    (      select distinct ProductID
      from Sales.SalesOrderHeader soh
      join Sales.SalesOrderDetail sod 
        on soh.SalesOrderID=sod.SalesOrderID
      where CustomerID=@CustomerID
    ) cp 
    cross apply 
    (      select top (3) SuggProdID=d2.ProductID
      from Sales.SalesOrderDetail d1 
      join Sales.SalesOrderHeader h1 
        on d1.SalesOrderID=h1.SalesOrderID
      join Sales.Customer c 
        on h1.CustomerID=c.CustomerID
      join Sales.SalesOrderHeader h2 
        on h1.CustomerID=h2.CustomerID 
      join Sales.SalesOrderDetail d2 
        on h2.SalesOrderID=d2.SalesOrderID
      join Production.Product p 
        on d2.ProductID=p.ProductID
      join Production.ProductSubCategory sc 
        on p.ProductSubCategoryID=sc.ProductSubCategoryID
      where d1.ProductID=cp.ProductID
        and h1.CustomerID<>@CustomerID
        and c.CustomerType='I' 
        and p.ProductSubCategoryID<>(select ProductSubCategoryID
                                     from Production.Product
                                     where ProductID=cp.ProductID)        and sc.ProductCategoryID in (3,4) 
        and d2.ProductID not in (select ProductID
                                 from Sales.SalesOrderDetail sod
                                 join Sales.SalesOrderHeader soh
                                   on sod.SalesOrderID=soh.SalesOrderID
                                 where CustomerID=@CustomerID)      group by d2.ProductID
      having count(distinct h2.SalesOrderID)>5
      order by count(distinct h2.SalesOrderID) desc    ) ufn_GetSuggProds
  ) sp
  join Production.Product p on sp.SuggProdID=p.ProductID
  group by SuggProdID
          ,p.ProductSubCategoryID) SuggSummarywhere RowNum=1order by NumOccurs desc
That looks pretty intimidating, but the path to getting there was quite easy, taking it one step at a time.

No comments:

Post a Comment