Friday, July 13, 2018

Market Basket Analysis: Understanding Customer Behaviour

Market Basket Analysis: Market basket analysis uses affinity analysis methods to understand customer purchase behavior. If a customer is regularly purchasing cereal and milk together for example, offering discounts for both of the items is not very logical, but offering a discount for one of the items can drive the sales of the other.

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Let me introduce to the world of cross-selling. Cross-selling is the process through which marketers sell a number of products to their existing customer thus banking on their customer lifetime value. Cross-selling is one of the most used technique to increase revenues and generate ROI from marketing efforts. With the ever-increasing cost of acquiring new customers and increasing competition, leveraging existing customers is the best option at the disposal of businesses, especially retailers. But how do you figure out which products to offer for cross-selling? The solution to your problem is market basket analysis (MBA) using big data analysis.

What is Market Basket Analysis?


Market basket analysis (MBA) is a business intelligence technique to predict future purchase decisions of the customers. It studies customers' buying patterns and preferences to predict what they will prefer to purchase along with the existing items in their cart.

What Is Market Basket Analysis?
Market Basket Analysis is a technique which identifies the strength of association between pairs of products purchased together and identify patterns of co-occurrence. A co-occurrence is when two or more things take place together.
Market Basket Analysis creates If-Then scenario rules, for example, if item A is purchased then item B is likely to be purchased. The rules are probabilistic in nature or, in other words, they are derived from the frequencies of co-occurrence in the observations. Frequency is the proportion of baskets that contain the items of interest. The rules can be used in pricing strategies, product placement, and various types of cross-selling strategies.
How Market Basket Analysis Works
In order to make it easier to understand, think of Market Basket Analysis in terms of shopping at a supermarket. Market Basket Analysis takes data at transaction level, which lists all items bought by a customer in a single purchase. The technique determines relationships of what products were purchased with which other product(s). These relationships are then used to build profiles containing If-Then rules of the items purchased.
The rules could be written as:
If {A} Then {B}
The If part of the rule (the {A} above) is known as the antecedent and the THEN part of the rule is known as the consequent (the {B} above). The antecedent is the condition and the consequent is the result. The association rule has three measures that express the degree of confidence in the rule, Support, Confidence, and Lift.

For example, if 3 out of 5 times a customer purchases egg along with flour and sugar (probably for baking cake) then market basket analysis can predict the possibility of buying egg if it is offered along with these two items. Market basket analysis is described mostly in form of associations for example:

  • If flour is purchased then sugar is also purchased
  • If sugar is purchased then flour is also purchased
  • If both flour and sugar are purchased then egg is purchased 60% of the time.
Here we use the term “antecedent” for IF and “Consequent” for THEN part of the statement. Thus, market basket analysis helps in making decisions regarding placement of goods, marketing communications, inventory maintenance etc.
Read More: How to Optimize Pricing Decisions Using Big Data?

Some Essentials of Market Basket Analysis:

Support - The support showcases the probability in favor of the event under analysis. If it is less than 50% then the association is considered less fruitful.

Confidence - It expresses the operational efficiency of the rule. It calculated as the ratio of the probability of occurrence of the favorable event to the probability of the occurrence of the antecedent.

Lift Ratio - The lift ratio calculates the efficiency of the rule in finding consequences, compared to a random selection of transactions. Generally, a Lift ratio of greater than one suggests some applicability of the rule.
Read More: Impact of Cloud Computing in Retail
Want to know about different big data techniques? Grab our free eBook on the latest trends, applications and developments.
Download Big Data eBook

Role of Big Data in Market Basket Analysis:

what is market basket analysis. Role of Big data in market basket analysis

While all of this sounded really easy when we took an example of 3 items but think how complicated it will get when you combine data sets from different items from grocery, personal hygiene, clothing, food and beverages, bathroom accessories, stationery, electronics, bags and wallets, and many other items found in a common retail store. According to Walmart’s official website, A Walmart Superstore has 142,000 different items in its store. These items can result in a tremendous number of possible subsets. If we start forming sets of 3 from a data small set of 100 items then 161,700 combinations are possible. Think how massive is the amount of data that is needed to be analyzed to figure out best combinations from 142,000 items. Additionally, there can be data sets from 2 items to 2000 items in this calculation. In e-commerce, this problem can increase to an even larger extent because of a wider range of items. According to export-x, as of 2015 Amazon had 488 million items in store.
In order to perform market basket analysis using big data, you need to use sophisticated analysis and modeling tools which are very hard to master. By ingesting the data from point of sale systems (offline stores) and carts (online stores) you can collect insights that can help you increase the efficiency of your cross-selling efforts. Big data analysis tools like Hadoop, Hive, Pig etc. make analysis of these huge data sets possible and data visualization tools like Tableau & Qlikview demonstrate the insights in the form of graphs that you can use to understand the data and take decisions accordingly.
Read More: 5 Uncommon Ways of Using Big Data in Retail

Benefits of Market Basket Analysis:


1. Store Layout:

Based on the insights from market basket analysis you can organize your store to increase revenues. Items that go along with each other should be placed near each other to help consumers notice them. This will guide the way a store should be organized to shoot for best revenues. With the help of this data you can eliminate the guesswork while determining the optimal store layout.

2. Marketing Messages:

Whether it is email, phone, social media or an offer by a direct salesman, market basket analysis can improve the efficiency of all of them. By using data from MBA you can suggest the next best product which a customer is likely to buy. Hence you will help your customers with fruitful suggestions instead of annoying them with marketing blasts.

3. Maintain Inventory:

Based on the inputs from MBA you can also predict future purchases of customers over a period of time. Using your initial sales data, you can predict which item would probably fall short and maintain stocks in optimal quality. This will help you improve the allocations of resources to different items of the inventory.

4. Content Placement:

In case of e-commerce businesses, website content placement is very important. If goods are displayed in right order than it can help boost conversions. MBA can also be used by online publishers and bloggers to display content which consumer is most likely to read next. This will reduce bounce rate, improve engagement and  result in better performance in search results.

5. Recommendation Engines:

Recommendation engines are already used by some popular companies like Netflix, Amazon, Facebook, etc. If you want to create an effective recommendation system for your company then you will also need market basket analysis to efficiently maintain one. MBA can be considered as the basis for creating a recommendation engine.
Read More: 5 ways augmented reality is changing the face of retail
As we have seen, market basket analysis can help companies especially retailers, to analyze buying behavior and predict their next purchase. If used effectively this can significantly improve cross-selling and in turn, help you increase your customer’s lifetime value. At NewGenApps, we have helped many companies successfully leverage their buyer’s data to generate insights that enabled them to reach new heights. If you need help in utilizing market basket analysis for your company then feel free to contact us:

One of the key techniques used by the large retailers is called Market Basket Analysis (MBA), which uncovers associations between products by looking for combinations of products that frequently co-occur in transactions. In other words, it allows the supermarkets to identify relationships between the products that people buy. For example, customers that buy a pencil and paper are likely to buy a rubber or ruler.
“Market Basket Analysis allows retailers to identify relationships between the products that people buy.”
Retailers can use the insights gained from MBA in a number of ways, including:
  1. Grouping products that co-occur in the design of a store’s layout to increase the chance of cross-selling;
  2. Driving online recommendation engines (“customers who purchased this product also viewed this product”); and
  3. Targeting marketing campaigns by sending out promotional coupons to customers for products related to items they recently purchased.
Given how popular and valuable MBA is, we thought we’d produce the following step-by-step guide describing how it works and how you could go about undertaking your own Market Basket Analysis.

How does Market Basket Analysis work?

To carry out an MBA you’ll first need a data set of transactions.  Each transaction represents a group of items or products that have been bought together and often referred to as an “itemset”. For example, one itemset might be: {pencil, paper, staples, rubber} in which case all of these items have been bought in a single transaction.
In an MBA, the transactions are analysed to identify rules of association. For example, one rule could be: {pencil, paper} => {rubber}. This means that if a customer has a transaction that contains a pencil and paper, then they are likely to be interested in also buying a rubber.
Before acting on a rule, a retailer needs to know whether there is sufficient evidence to suggest that it will result in a beneficial outcome. We therefore measure the strength of a rule by calculating the following three metrics (note other metrics are available, but these are the three most commonly used):
Support: the percentage of transactions that contain all of the items in an itemset (e.g., pencil, paper and rubber). The higher the support the more frequently the itemset occurs. Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.
Confidence: the probability that a transaction that contains the items on the left hand side of the rule (in our example, pencil and paper) also contains the item on the right hand side (a rubber). The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate you can expect for a given rule.
Lift: the probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. For example, if pencil, paper and rubber occurred together in 2.5% of all transactions, pencil and paper in 10% of transactions and rubber in 8% of transactions, then the lift would be: 0.025/(0.1*0.08) = 3.125. A lift of more than 1 suggests that the presence of pencil and paper increases the probability that a rubber will also occur in the transaction. Overall, lift summarises the strength of association between the products on the left and right hand side of the rule; the larger the lift the greater the link between the two products.
To perform a Market Basket Analysis and identify potential rules, a data mining algorithm called the ‘Apriori algorithm’ is commonly used, which works in two steps:
  1. Systematically identify itemsets that occur frequently in the data set with a support greater than a pre-specified threshold.
  2. Calculate the confidence of all possible rules given the frequent itemsets and keep only those with a confidence greater than a pre-specified threshold.
The thresholds at which to set the support and confidence are user-specified and are likely to vary between transaction data sets. R does have default values, but we recommend that you experiment with these to see how they affect the number of rules returned (more on this below). Finally, although the Apriori algorithm does not use lift to establish rules, you’ll see in the following that we use lift when exploring the rules that the algorithm returns.

Performing Market Basket Analysis in R

To demonstrate how to carry out an MBA we’ve chosen to use R and, in particular, the arules package. For those that are interested we’ve included the R code that we used at the end of this blog.
Here, we follow the same example used in the arulesViz Vignette and use a data set of grocery sales that contains 9,835 individual transactions with 169 items. The first thing we do is have a look at the items in the transactions and, in particular, plot the relative frequency of the 25 most frequent items in Figure 1. This is equivalent to the support of these items where each itemset contains only the single item. This bar plot illustrates the groceries that are frequently bought at this store, and it is notable that the support of even the most frequent items is relatively low (for example, the most frequent item occurs in only around 2.5% of transactions). We use these insights to inform the minimum threshold when running the Apriori algorithm; for example, we know that in order for the algorithm to return a reasonable number of rules we’ll need to set the support threshold at well below 0.025.
Bar plot of support
Figure 1 A bar plot of the support of the 25 most frequent items bought.
By setting a support threshold of 0.001 and confidence of 0.5, we can run the Apriori algorithm and obtain a set of 5,668 results. These threshold values are chosen so that the number of rules returned is high, but this number would reduce if we increased either threshold. We would recommend experimenting with these thresholds to obtain the most appropriate values. Whilst there are too many rules to be able to look at them all individually, we can look at the five rules with the largest lift:

Market Basket Analysis

What is it?

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don't buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn't buy beer.
The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases.
Typically the relationship will be in the form of a rule:
IF {beer, no bar meal} THEN {crisps}.
The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence. The algorithms for performing market basket analysis are fairly straightforward (Berry and Linhoff is a reasonable introductory resource for this). The complexities mainly arise in exploiting taxonomies, avoiding combinatorial explosions (a supermarket may stock 10,000 or more line items), and dealing with the large amounts of transaction data that may be available.
A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business. Although the volume of data has been reduced, we are still asking the user to find a needle in a haystack. Requiring rules to have a high minimum support level and a high confidence level risks missing any exploitable result we might have found. One partial solution to this problem is differential market basket analysis, as described below.

How is it used?

In retailing, most purchases are bought on impulse. Market basket analysis gives clues as to what a customer might have bought if the idea had occurred to them . (For some real insights into consumer behavior, see Why We Buy: The Science of Shopping by Paco Underhill.)
As a first step, therefore, market basket analysis can be used in deciding the location and promotion of goods inside a store. If, as has been observed, purchasers of Barbie dolls have are more likely to buy candy, then high-margin candy can be placed near to the Barbie doll display. Customers who would have bought candy with their Barbie dolls had they thought of it will now be suitably tempted.
But this is only the first level of analysis. Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results.
In differential analysis, we compare results between different stores, between customers in different demographic groups, between different days of the week, different seasons of the year, etc.
If we observe that a rule holds in one store, but not in any other (or does not hold in one store, but holds in all others), then we know that there is something interesting about that store. Perhaps its clientele are different, or perhaps it has organized its displays in a novel and more lucrative way. Investigating such differences may yield useful insights which will improve company sales.


Terminology

Items are the objects that we are identifying associations between. For an online retailer, each item is a product in the shop. For a publisher, each item might be an article, a blog post, a video etc. A group of items is an item set.

Transactions are instances of groups of items co-occuring together. For an online retailer, a transaction is, generally, a, transaction. For a publisher, a transaction might be the group of articles read in a single visit to the website. (It is up to the analyst to define over what period to measure a transaction.) For each transaction, then, we have an item set.

Rules are statements of the form

i.e. if you have the items in item set (on the left hand side (LHS) of the rule i.e. {i_1, i_2,...} , then it is likely that a visitor will be interested in the item on the right hand side (RHS i.e. {i_k}. In our example above, our rule would be:

The output of a market basket analysis is generally a set of rules, that we can then exploit to make business decisions (related to marketing or product placement, for example).
The support of an item or item set is the fraction of transactions in our data set that contain that item or item set. In general, it is nice to identify rules that have a high support, as these will be applicable to a large number of transactions. For super market retailers, this is likely to involve basic products that are popular across an entire user base (e.g. bread, milk). A printer cartridge retailer, for example, may not have products with a high support, because each customer only buys cartridges that are specific to his / her own printer.
The confidence of a rule is the likelihood that it is true for a new transaction that contains the items on the LHS of the rule. (I.e. it is the probability that the transaction also contains the item(s) on the RHS.) Formally:
The lift of a rule is the ratio of the support of the items on the LHS of the rule co-occuring with items on the RHS divided by probability that the LHS and RHS co-occur if the two are independent.
If lift is greater than 1, it suggests that the precense of the items on the LHS has increased the probability that the items on the right hand side will occur on this transaction. If the lift is below 1, it suggests that the presence of the items on the LHS make the probability that the items on the RHS will be part of the transaction lower. If the lift is 1, it suggests that the presence of items on the LHS and RHS really are independent: knowing that the items on the LHS are present makes no difference to the probability that items will occur on the RHS.
When we perform market basket analysis, then, we are looking for rules with a lift of more than one. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS. It is also preferable (higher value) to action rules that have a high support - as these will be applicable to a larger number of transactions. However, in the case of long-tail retailers, this may not be possible.

Other Application Areas

Although Market Basket Analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in which it can be applied. These include:
  • Analysis of credit card purchases.
  • Analysis of telephone calling patterns.
  • Identification of fraudulent medical insurance claims.
    (Consider cases where common rules are broken).
  • Analysis of telecom service purchases.
Note that despite the terminology, there is no requirement for all the items to be purchased at the same time. The algorithms can be adapted to look at a sequence of purchases (or events) spread out over time. A predictive market basket analysis can be used to identify sets of item purchases (or events) that generally occur in sequence — something of interest to direct marketers, criminologists and many others.





Once it is known that customers who buy one product are likely to buy another, it is possible for the company to market the products together, or to make the purchasers of one product the target prospects for another.  If customers who purchase diapers are already likely to purchase beer, they’ll be even more likely to if there happens to be a beer display just outside the diaper aisle.  Likewise, if it’s known that customers who buy a sweater and casual pants from a certain mail-order catalog have a propensity toward buying a jacket from the same catalog, sales of jackets can be increased by having the telephone representatives describe and offer the jacket to anyone who calls in to order the sweater and pants.  Still better, the catalogue company can provide an additional 5% discount on a package containing the sweater, pants, and jacket simultaneously and promote well the complete package.  The dollar amount of sales is guaranteed to go up.  By targeting customers who are already known to be likely buyers, the effectiveness of marketing is significantly increased – regardless of if the marketing takes the form of in-store displays, catalog layout design, or direct offers to customers.  This is the purpose of market basket analysis – to improve the effectiveness of marketing and sales tactics using customer data already available to the company.


Minimum support/ Minimum association volume:  This control is used to fine-tune the basket analysis.  For yes/no data this control comes in the form of Minimum support level in percent.  It sets the minimum part of transactions that should contain a basket of products in order to consider this basket as a distinct stable product group.  By default this value is equal to 10%. If this number is set high, only products whose co-occurrence is in a very large number of transactions will be considered.  This will result in a small number of product clusters being found, and each cluster containing only a few products (often 2).  A high value is desirable when looking for 1-to-1 rules ("If Product A sold in this transaction, Product B will probably sell also.") A lower minimum association volume will force those products which occur together less frequently, to be considered as market baskets (product clusters).  This results in many, larger clusters.  This can be desirable, for example, if you are organizing a store or catalog and simply wish to know which products to place in the same area of the store.

For numeric data, Minimum association volume should be set either in the corresponding currency units or in units of products purchased.  The default value is 0 in this case – and should be changed by the user in order to achieve better results.

In the considered example, we are going to first leave Basket Analysis default value of 10% of all the transactions.  Thus, only products that occur together in 10% or more of all transactions will be included in some basket in this analysis.

Minimum improvement:  Improvement indicates how much better the confidence of the found directed association rule is than that obtained by random guessing. Improvement roughly corresponds to how much more money a retailer might generate by exploiting the association rule found by PolyAnalyst Market Basket Analysis.  The default value of Minimum improvement is set to 2.

Minimum confidence:  Confidence is a probability, that if a customer purchases a considered group of items, he is going to purchase the other considered item.  The higher is the confidence for a rule, the more value this rule has for real world applications.  Yet, if the Minimum confidence is set too high, we might find no association rules providing such confidence.  The user should experimentally determine an optimal value for Minimum confidence. The default value is 65%.


 Market Basket Analysis can guarantee is the quality, user-controlled flexibility, and an incredible speed of the performed analysis!

 Given a set of transactions, association rule mining aims to find the rules which enable us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.


Practical Applications of Market Basket Analysis
When one hears Market Basket Analysis, one thinks of shopping carts and supermarket shoppers. It is important to realize that there are many other areas in which Market Basket Analysis can be applied. An example of Market Basket Analysis for a majority of Internet users is a list of potentially interesting products for Amazon. Amazon informs the customer that people who bought the item being purchased by them, also reviewed or bought another list of items. A list of applications of Market Basket Analysis in various industries is listed below:
  • Retail. In Retail, Market Basket Analysis can help determine what items are purchased together, purchased sequentially, and purchased by season. This can assist retailers to determine product placement and promotion optimization (for instance, combining product incentives). Does it make sense to sell soda and chips or soda and crackers?
  • Telecommunications. In Telecommunications, where high churn rates continue to be a growing concern, Market Basket Analysis can be used to determine what services are being utilized and what packages customers are purchasing. They can use that knowledge to direct marketing efforts at customers who are more likely to follow the same path.
    For instance, Telecommunications these days is also offering TV and Internet. Creating bundles for purchases can be determined from an analysis of what customers purchase, thereby giving the company an idea of how to price the bundles. This analysis might also lead to determining the capacity requirements.
  • Banks. In Financial (banking for instance), Market Basket Analysis can be used to analyze credit card purchases of customers to build profiles for fraud detection purposes and cross-selling opportunities.
  • Insurance. In Insurance, Market Basket Analysis can be used to build profiles to detect medical insurance claim fraud. By building profiles of claims, you are able to then use the profiles to determine if more than 1 claim belongs to a particular claimee within a specified period of time.
  • Medical. In Healthcare or Medical, Market Basket Analysis can be used for comorbid conditions and symptom analysis, with which a profile of illness can be better identified. It can also be used to reveal biologically relevant associations between different genes or between environmental effects and gene expression.

No comments:

Post a Comment