Market Basket Analysis: Market basket analysis uses affinity analysis methods to understand customer purchase behavior. If a customer is regularly purchasing cereal and milk together for example, offering discounts for both of the items is not very logical, but offering a discount for one of the items can drive the sales of the other.
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.
Let me introduce to the world of cross-selling. Cross-selling is the process through which marketers sell a number of products to their existing customer thus banking on their customer lifetime value. Cross-selling is one of the most used technique to increase revenues and generate ROI from marketing efforts. With the ever-increasing cost of acquiring new customers and increasing competition, leveraging existing customers is the best option at the disposal of businesses, especially retailers. But how do you figure out which products to offer for cross-selling? The solution to your problem is market basket analysis (MBA) using big data analysis.
Market basket analysis (MBA) is a business intelligence technique to predict future purchase decisions of the customers. It studies customers' buying patterns and preferences to predict what they will prefer to purchase along with the existing items in their cart.
In an MBA, the transactions are analysed to identify rules of association. For example, one rule could be: {pencil, paper} => {rubber}. This means that if a customer has a transaction that contains a pencil and paper, then they are likely to be interested in also buying a rubber.
Before acting on a rule, a retailer needs to know whether there is sufficient evidence to suggest that it will result in a beneficial outcome. We therefore measure the strength of a rule by calculating the following three metrics (note other metrics are available, but these are the three most commonly used):
Support: the percentage of transactions that contain all of the items in an itemset (e.g., pencil, paper and rubber). The higher the support the more frequently the itemset occurs. Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.
Confidence: the probability that a transaction that contains the items on the left hand side of the rule (in our example, pencil and paper) also contains the item on the right hand side (a rubber). The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate you can expect for a given rule.
Lift: the probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. For example, if pencil, paper and rubber occurred together in 2.5% of all transactions, pencil and paper in 10% of transactions and rubber in 8% of transactions, then the lift would be: 0.025/(0.1*0.08) = 3.125. A lift of more than 1 suggests that the presence of pencil and paper increases the probability that a rubber will also occur in the transaction. Overall, lift summarises the strength of association between the products on the left and right hand side of the rule; the larger the lift the greater the link between the two products.
To perform a Market Basket Analysis and identify potential rules, a data mining algorithm called the ‘Apriori algorithm’ is commonly used, which works in two steps:
Here, we follow the same example used in the arulesViz Vignette and use a data set of grocery sales that contains 9,835 individual transactions with 169 items. The first thing we do is have a look at the items in the transactions and, in particular, plot the relative frequency of the 25 most frequent items in Figure 1. This is equivalent to the support of these items where each itemset contains only the single item. This bar plot illustrates the groceries that are frequently bought at this store, and it is notable that the support of even the most frequent items is relatively low (for example, the most frequent item occurs in only around 2.5% of transactions). We use these insights to inform the minimum threshold when running the Apriori algorithm; for example, we know that in order for the algorithm to return a reasonable number of rules we’ll need to set the support threshold at well below 0.025.
By setting a support threshold of 0.001 and confidence of 0.5, we can run the Apriori algorithm and obtain a set of 5,668 results. These threshold values are chosen so that the number of rules returned is high, but this number would reduce if we increased either threshold. We would recommend experimenting with these thresholds to obtain the most appropriate values. Whilst there are too many rules to be able to look at them all individually, we can look at the five rules with the largest lift:
The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases.
Typically the relationship will be in the form of a rule:
A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business. Although the volume of data has been reduced, we are still asking the user to find a needle in a haystack. Requiring rules to have a high minimum support level and a high confidence level risks missing any exploitable result we might have found. One partial solution to this problem is differential market basket analysis, as described below.
As a first step, therefore, market basket analysis can be used in deciding the location and promotion of goods inside a store. If, as has been observed, purchasers of Barbie dolls have are more likely to buy candy, then high-margin candy can be placed near to the Barbie doll display. Customers who would have bought candy with their Barbie dolls had they thought of it will now be suitably tempted.
But this is only the first level of analysis. Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results.
In differential analysis, we compare results between different stores, between customers in different demographic groups, between different days of the week, different seasons of the year, etc.
If we observe that a rule holds in one store, but not in any other (or does not hold in one store, but holds in all others), then we know that there is something interesting about that store. Perhaps its clientele are different, or perhaps it has organized its displays in a novel and more lucrative way. Investigating such differences may yield useful insights which will improve company sales.
Transactions are instances of groups of items co-occuring together. For an online retailer, a transaction is, generally, a, transaction. For a publisher, a transaction might be the group of articles read in a single visit to the website. (It is up to the analyst to define over what period to measure a transaction.) For each transaction, then, we have an item set.
Rules are statements of the form
i.e. if you have the items in item set (on the left hand side (LHS) of the rule i.e.
The output of a market basket analysis is generally a set of rules, that we can then exploit to make business decisions (related to marketing or product placement, for example).
The support of an item or item set is the fraction of transactions in our data set that contain that item or item set. In general, it is nice to identify rules that have a high support, as these will be applicable to a large number of transactions. For super market retailers, this is likely to involve basic products that are popular across an entire user base (e.g. bread, milk). A printer cartridge retailer, for example, may not have products with a high support, because each customer only buys cartridges that are specific to his / her own printer.
The confidence of a rule is the likelihood that it is true for a new transaction that contains the items on the LHS of the rule. (I.e. it is the probability that the transaction also contains the item(s) on the RHS.) Formally:
The lift of a rule is the ratio of the support of the items on the LHS of the rule co-occuring with items on the RHS divided by probability that the LHS and RHS co-occur if the two are independent.
If lift is greater than 1, it suggests that the precense of the items on the LHS has increased the probability that the items on the right hand side will occur on this transaction. If the lift is below 1, it suggests that the presence of the items on the LHS make the probability that the items on the RHS will be part of the transaction lower. If the lift is 1, it suggests that the presence of items on the LHS and RHS really are independent: knowing that the items on the LHS are present makes no difference to the probability that items will occur on the RHS.
When we perform market basket analysis, then, we are looking for rules with a lift of more than one. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS. It is also preferable (higher value) to action rules that have a high support - as these will be applicable to a larger number of transactions. However, in the case of long-tail retailers, this may not be possible.
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.
Let me introduce to the world of cross-selling. Cross-selling is the process through which marketers sell a number of products to their existing customer thus banking on their customer lifetime value. Cross-selling is one of the most used technique to increase revenues and generate ROI from marketing efforts. With the ever-increasing cost of acquiring new customers and increasing competition, leveraging existing customers is the best option at the disposal of businesses, especially retailers. But how do you figure out which products to offer for cross-selling? The solution to your problem is market basket analysis (MBA) using big data analysis.
What is Market Basket Analysis?
Market basket analysis (MBA) is a business intelligence technique to predict future purchase decisions of the customers. It studies customers' buying patterns and preferences to predict what they will prefer to purchase along with the existing items in their cart.
What Is Market Basket Analysis?
Market Basket Analysis is a technique which identifies the strength
of association between pairs of products purchased together and
identify patterns of co-occurrence. A co-occurrence is when two
or more things take place together.
Market Basket Analysis creates If-Then scenario rules, for example,
if item A is purchased then item B is likely to be purchased. The
rules are probabilistic in nature or, in other words, they are derived
from the frequencies of co-occurrence in the observations. Frequency
is the proportion of baskets that contain the items of interest.
The rules can be used in pricing strategies, product placement,
and various types of cross-selling strategies.
How Market Basket Analysis Works
In order to make it easier to understand, think of Market Basket
Analysis in terms of shopping at a supermarket. Market Basket Analysis
takes data at transaction level, which lists all items bought by
a customer in a single purchase. The technique determines relationships
of what products were purchased with which other product(s). These relationships
are then used to build profiles containing If-Then rules of the
items purchased.
The rules could be written as:
If {A} Then {B}
The If part of the rule (the {A} above) is known as the
antecedent and the THEN part of the rule is known as the
consequent (the {B} above). The antecedent is the condition and
the consequent is the result. The association rule has three measures that
express the degree of confidence in the rule, Support, Confidence,
and Lift.
One of the key techniques used by the large retailers is called Market Basket Analysis (MBA), which uncovers associations between products by looking for combinations of products that frequently co-occur in transactions. In other words, it allows the supermarkets to identify relationships between the products that people buy. For example, customers that buy a pencil and paper are likely to buy a rubber or ruler.
“Market Basket Analysis allows retailers to identify relationships between the products that people buy.”Retailers can use the insights gained from MBA in a number of ways, including:
- Grouping products that co-occur in the design of a store’s layout to increase the chance of cross-selling;
- Driving online recommendation engines (“customers who purchased this product also viewed this product”); and
- Targeting marketing campaigns by sending out promotional coupons to customers for products related to items they recently purchased.
How does Market Basket Analysis work?
To carry out an MBA you’ll first need a data set of transactions. Each transaction represents a group of items or products that have been bought together and often referred to as an “itemset”. For example, one itemset might be: {pencil, paper, staples, rubber} in which case all of these items have been bought in a single transaction.In an MBA, the transactions are analysed to identify rules of association. For example, one rule could be: {pencil, paper} => {rubber}. This means that if a customer has a transaction that contains a pencil and paper, then they are likely to be interested in also buying a rubber.
Before acting on a rule, a retailer needs to know whether there is sufficient evidence to suggest that it will result in a beneficial outcome. We therefore measure the strength of a rule by calculating the following three metrics (note other metrics are available, but these are the three most commonly used):
Support: the percentage of transactions that contain all of the items in an itemset (e.g., pencil, paper and rubber). The higher the support the more frequently the itemset occurs. Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.
Confidence: the probability that a transaction that contains the items on the left hand side of the rule (in our example, pencil and paper) also contains the item on the right hand side (a rubber). The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate you can expect for a given rule.
Lift: the probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. For example, if pencil, paper and rubber occurred together in 2.5% of all transactions, pencil and paper in 10% of transactions and rubber in 8% of transactions, then the lift would be: 0.025/(0.1*0.08) = 3.125. A lift of more than 1 suggests that the presence of pencil and paper increases the probability that a rubber will also occur in the transaction. Overall, lift summarises the strength of association between the products on the left and right hand side of the rule; the larger the lift the greater the link between the two products.
To perform a Market Basket Analysis and identify potential rules, a data mining algorithm called the ‘Apriori algorithm’ is commonly used, which works in two steps:
- Systematically identify itemsets that occur frequently in the data set with a support greater than a pre-specified threshold.
- Calculate the confidence of all possible rules given the frequent itemsets and keep only those with a confidence greater than a pre-specified threshold.
Performing Market Basket Analysis in R
To demonstrate how to carry out an MBA we’ve chosen to use R and, in particular, the arules package. For those that are interested we’ve included the R code that we used at the end of this blog.Here, we follow the same example used in the arulesViz Vignette and use a data set of grocery sales that contains 9,835 individual transactions with 169 items. The first thing we do is have a look at the items in the transactions and, in particular, plot the relative frequency of the 25 most frequent items in Figure 1. This is equivalent to the support of these items where each itemset contains only the single item. This bar plot illustrates the groceries that are frequently bought at this store, and it is notable that the support of even the most frequent items is relatively low (for example, the most frequent item occurs in only around 2.5% of transactions). We use these insights to inform the minimum threshold when running the Apriori algorithm; for example, we know that in order for the algorithm to return a reasonable number of rules we’ll need to set the support threshold at well below 0.025.
By setting a support threshold of 0.001 and confidence of 0.5, we can run the Apriori algorithm and obtain a set of 5,668 results. These threshold values are chosen so that the number of rules returned is high, but this number would reduce if we increased either threshold. We would recommend experimenting with these thresholds to obtain the most appropriate values. Whilst there are too many rules to be able to look at them all individually, we can look at the five rules with the largest lift:
Market Basket Analysis
What is it?
Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don't buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn't buy beer.The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases.
Typically the relationship will be in the form of a rule:
IF {beer, no bar meal} THEN {crisps}.The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence. The algorithms for performing market basket analysis are fairly straightforward (Berry and Linhoff is a reasonable introductory resource for this). The complexities mainly arise in exploiting taxonomies, avoiding combinatorial explosions (a supermarket may stock 10,000 or more line items), and dealing with the large amounts of transaction data that may be available.
A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business. Although the volume of data has been reduced, we are still asking the user to find a needle in a haystack. Requiring rules to have a high minimum support level and a high confidence level risks missing any exploitable result we might have found. One partial solution to this problem is differential market basket analysis, as described below.
How is it used?
In retailing, most purchases are bought on impulse. Market basket analysis gives clues as to what a customer might have bought if the idea had occurred to them . (For some real insights into consumer behavior, see Why We Buy: The Science of Shopping by Paco Underhill.)As a first step, therefore, market basket analysis can be used in deciding the location and promotion of goods inside a store. If, as has been observed, purchasers of Barbie dolls have are more likely to buy candy, then high-margin candy can be placed near to the Barbie doll display. Customers who would have bought candy with their Barbie dolls had they thought of it will now be suitably tempted.
But this is only the first level of analysis. Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results.
In differential analysis, we compare results between different stores, between customers in different demographic groups, between different days of the week, different seasons of the year, etc.
If we observe that a rule holds in one store, but not in any other (or does not hold in one store, but holds in all others), then we know that there is something interesting about that store. Perhaps its clientele are different, or perhaps it has organized its displays in a novel and more lucrative way. Investigating such differences may yield useful insights which will improve company sales.
Terminology
Items are the objects that we are identifying associations between. For an online retailer, each item is a product in the shop. For a publisher, each item might be an article, a blog post, a video etc. A group of items is an item set.Transactions are instances of groups of items co-occuring together. For an online retailer, a transaction is, generally, a, transaction. For a publisher, a transaction might be the group of articles read in a single visit to the website. (It is up to the analyst to define over what period to measure a transaction.) For each transaction, then, we have an item set.
Rules are statements of the form
i.e. if you have the items in item set (on the left hand side (LHS) of the rule i.e.
{i_1, i_2,...}
, then it is likely that a visitor will be interested in the item on the right hand side (RHS i.e. {i_k}
. In our example above, our rule would be:The output of a market basket analysis is generally a set of rules, that we can then exploit to make business decisions (related to marketing or product placement, for example).
The support of an item or item set is the fraction of transactions in our data set that contain that item or item set. In general, it is nice to identify rules that have a high support, as these will be applicable to a large number of transactions. For super market retailers, this is likely to involve basic products that are popular across an entire user base (e.g. bread, milk). A printer cartridge retailer, for example, may not have products with a high support, because each customer only buys cartridges that are specific to his / her own printer.
The confidence of a rule is the likelihood that it is true for a new transaction that contains the items on the LHS of the rule. (I.e. it is the probability that the transaction also contains the item(s) on the RHS.) Formally:
The lift of a rule is the ratio of the support of the items on the LHS of the rule co-occuring with items on the RHS divided by probability that the LHS and RHS co-occur if the two are independent.
If lift is greater than 1, it suggests that the precense of the items on the LHS has increased the probability that the items on the right hand side will occur on this transaction. If the lift is below 1, it suggests that the presence of the items on the LHS make the probability that the items on the RHS will be part of the transaction lower. If the lift is 1, it suggests that the presence of items on the LHS and RHS really are independent: knowing that the items on the LHS are present makes no difference to the probability that items will occur on the RHS.
When we perform market basket analysis, then, we are looking for rules with a lift of more than one. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS. It is also preferable (higher value) to action rules that have a high support - as these will be applicable to a larger number of transactions. However, in the case of long-tail retailers, this may not be possible.
Other Application Areas
Although Market Basket Analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in which it can be applied. These include:- Analysis of credit card purchases.
- Analysis of telephone calling patterns.
- Identification of fraudulent medical insurance claims.
(Consider cases where common rules are broken). - Analysis of telecom service purchases.
Once it is known that customers who buy one
product are
likely to buy another, it is possible for the company to market the
products
together, or to make the purchasers of one product the target prospects
for
another. If customers who purchase diapers are already likely to
purchase
beer, they’ll be even more likely to if there happens to be a beer
display just
outside the diaper aisle. Likewise, if it’s known that customers who
buy a
sweater and casual pants from a certain mail-order catalog have a
propensity
toward buying a jacket from the same catalog, sales of jackets can be
increased
by having the telephone representatives describe and offer the jacket to
anyone
who calls in to order the sweater and pants. Still better, the
catalogue
company can provide an additional 5% discount on a package containing
the
sweater, pants, and jacket simultaneously and promote well the complete
package. The dollar amount of sales is guaranteed to go up. By
targeting
customers who are already known to be likely buyers, the effectiveness
of
marketing is significantly increased – regardless of if the marketing
takes the
form of in-store displays, catalog layout design, or direct offers to
customers. This is the purpose of market basket analysis – to improve
the effectiveness of marketing and sales tactics using customer data
already available
to the company.
Minimum support/ Minimum
association volume: This control is used to fine-tune the
basket analysis. For yes/no data this control comes in the form of Minimum
support level in percent. It sets the minimum part of transactions that should
contain a basket of products in order to consider this basket as a distinct
stable product group. By default this value is equal to 10%. If this number is
set high, only products whose co-occurrence is in a very large number of
transactions will be considered. This will result in a small number of product
clusters being found, and each cluster containing only a few products (often
2). A high value is desirable when looking for 1-to-1 rules ("If Product
A sold in this transaction, Product B will probably sell also.") A lower
minimum association volume will force those products which occur together less
frequently, to be considered as market baskets (product clusters). This
results in many, larger clusters. This can be desirable, for example, if you
are organizing a store or catalog and simply wish to know which products to
place in the same area of the store.
For numeric data, Minimum association
volume should be set either in the corresponding currency units or in
units of products purchased. The default value is 0 in this case – and
should be changed by the user in
order to achieve better results.
In the considered example, we are going to first leave Basket Analysis default value of 10% of all the transactions. Thus, only products that
occur together in 10% or more of all transactions will be included in some
basket in this analysis.
Minimum improvement: Improvement
indicates how much better
the confidence of the found directed association rule is than that
obtained by random guessing. Improvement roughly corresponds to how much
more money a
retailer might generate by exploiting the association rule found by PolyAnalyst Market Basket Analysis. The default value of Minimum improvement is set to 2.
Minimum confidence: Confidence
is a probability, that if a
customer purchases a considered group of items, he is going to purchase
the
other considered item. The higher is the confidence for a rule, the
more value this rule has for real world applications. Yet, if the Minimum confidence
is
set too high, we might find no association rules providing such
confidence. The user should experimentally determine an optimal value
for
Minimum confidence. The default value is 65%.
Market Basket Analysis can guarantee is the quality, user-controlled flexibility, and an incredible speed of the performed analysis!
Given a set of transactions, association rule mining aims to find the rules which enable us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.
Given a set of transactions, association rule mining aims to find the rules which enable us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.
Practical Applications of Market Basket Analysis
When one hears Market Basket Analysis, one thinks of shopping
carts and supermarket shoppers. It is important to realize that
there are many other areas in which Market Basket Analysis can be
applied. An example of Market Basket Analysis for a majority of
Internet users is a list of potentially interesting products for
Amazon. Amazon informs the customer that people who bought the item
being purchased by them, also reviewed or bought another list of
items. A list of applications of Market Basket Analysis in various industries
is listed below:
- Retail. In Retail, Market Basket Analysis can help determine what items are purchased together, purchased sequentially, and purchased by season. This can assist retailers to determine product placement and promotion optimization (for instance, combining product incentives). Does it make sense to sell soda and chips or soda and crackers?
- Telecommunications. In
Telecommunications, where high churn rates continue to be a growing
concern, Market Basket Analysis can be used to determine what services
are being utilized and what packages customers are purchasing. They
can use that knowledge to direct marketing efforts at customers
who are more likely to follow the same path.
For instance, Telecommunications these days is also offering TV and Internet. Creating bundles for purchases can be determined from an analysis of what customers purchase, thereby giving the company an idea of how to price the bundles. This analysis might also lead to determining the capacity requirements.
- Banks. In Financial (banking for instance), Market Basket Analysis can be used to analyze credit card purchases of customers to build profiles for fraud detection purposes and cross-selling opportunities.
- Insurance. In Insurance, Market Basket Analysis can be used to build profiles to detect medical insurance claim fraud. By building profiles of claims, you are able to then use the profiles to determine if more than 1 claim belongs to a particular claimee within a specified period of time.
- Medical. In Healthcare or Medical, Market Basket Analysis can be used for comorbid conditions and symptom analysis, with which a profile of illness can be better identified. It can also be used to reveal biologically relevant associations between different genes or between environmental effects and gene expression.
No comments:
Post a Comment