Jonathan
Rosen
Lecture Notes - Prof. Narahari
designed to be viewed with IE4+
-Data Mining & Personalization-
Table of Contents
Predictive Modeling
CRM Goal
Personalization in E-commerce
Personalization Processes Chain
B2c Objectives
Impact of Personalization
Privacy Issues
Diagram of the Personalization Process
Knowing the Customer
Broadvision.com
How does this all work?
Knowledge Discovery in Databases (KDD)
Data Mining Problems
Data Mining Tasks
Predictive
Modeling
·Predictive
modeling is a system which aids an entity in predicting what one of their
users will do next. Multiple actions by the user are considered in determining
the eventual outcome.
·Microstrategy
(http://www.microstrategy.com)
is company well known for their software solution suite which helps companies
employ predicative modeling through data-mining.
Data Mining
·Data
mining is “the
exploration and analysis by automatic or semiautomatic means, of large
quantities of data in order to discover meaningful patterns and rules.”
·Companies
use data mining in order to determine what their customers will do next.
·Data
mining is not limited to the web.
·Companies
like Structure (a men’s clothing vendor) use data mining on their store
credit cards to see how much and when their customers purchase. They then
use that information to make incentive offers to customer based on their
buying patterns.
Why Data Mine?
·Customer
service has become more important to the consumer
·Price
differentiation is no longer enough
·Currently
the trend in business
·Know
more about the consumer, to better serve the consumer
·Can
use increased intelligence as leverage and also in B2B marketplaces
CRM (Customer
Relationship Management)
·Getting
a new customer today costs more than keeping one
·This
is directly related to the high acquisition costs associated with today’s
market. One source contends that it costs five times as much to acquire
a new customer (Peppers & Rogers), although an actual figure is highly
variable depending on the market and point in time
·By
getting to know a customer better, one can serve them better and make more
money
·An
example of this might occur when Customer A has been buying grass seed
from Company X for 3 years. A good CRM mechanism would determine that Customer
A might also be interested in buying other lawn care products such as fertilizer,
garden hose, etc.
·A
5% increase in retention, reduces overhead by up to 10%
·CRM
allows for increasingly sophisticated mass-market offers through better
segmentation
CRM Goal
·Define
the customer segments
·Carefully
address legal and ethical concerns
·Lights
out executions against segments
·Attribution
& evaluation of responses
Personalization
in E-commerce
·Positive:
·Easier
to personalize by gather aggregate data
·Can
literally ‘follow’ a customer around through a digital store (but obviously
not through a mortar store)
·Much
easier to experiment with the placement of goods, etc. in a digital store
as opposed to a mortar store
·Negative:
·Web
based shopping is not a proven better way to sell items
·Concerns
over how much people purchase higher margin items or ‘touchy feely’ things
have surfaced.
·Difficult
to differentiate prices against geography
·i.e.
how do you tell someone in Kansas and item costs $X and someone in New
York in costs $Y? May be justified because of shipping costs, but the consumer
may not see it that way.
Personalization
Processes Chain
B2c Objectives
·KNOW
THE CUSTOMER
·Gather
data during registration
·Use
cookies to remember the customer
·Possibly
even show the customer a specialized storefront which reflects their interests
when the visit the site based on the cookie
·Find
out what the customer wants and deliver
·Use
questionnaires
·Collect
data, click streams, etc.
·Review
histories (visits, orders, what they look at, etc.)
·Amazon.com
(http://www.amazon.com) has excelled
in the area of following customers around to see what they look at and
what else they might be interested in
·Amazon
automatically generates a “My Page” which has items listed on it based
on your past purchases, searches and items you have viewed
·Deliver:
·Customized
promotions
·Customized
products (!!)
·I.e.
sell someone who likes action movies a special DVD box set designed for
them with their favorite action stars
Impact of
Personalization
·Able
to learn more about customers
·Happens
invisibly, which could not take place in a store (especially in terms of
collecting aggregate data of click streams)
·Helps
to decide how to improve the system
Privacy Issues
·Large
numbers of customers are concerned about what data is collected about them
and who sees it.
·Customers
give more info to a trusted site.
·Making
one’s site secure and trustworthy are key ingredients to being able to
collect accurate data about consumers
·Untrusting
consumers may be unwilling to give their personal information to a site
or may even enter false information to a site.
·Sites
must have clear privacy statements
·Third
party evaluators such as the Better Business Bureau (BBC) exist to evaluate
sites for a fee and will assure consumers of the sites integrity. (http://www.bbb.com)
·BBB.com
is more geared towards small business
·TRUSTe.com
provides a similar service, and has better internet brand recognition,
but because of their costs is geared more towards larger businesses. (http://www.truste.com)
Diagram of
the Personalization Process
Process:
1.Get
to know the customer through the gathering of data
2.Create
a profile for the user, extrapolate from past histories
3.Segment
the user based on his profile
4.Extrapolate
predictions based on the user’s data, as well as his peers in the user’s
segment
5.Deliver
customized content, offers, etc. to the consumer
6.Allow
the consumer to log in/access the customized content directly.
·Examples:
send users to categories they frequently visit.
·Make
sure the user can still easily access
the entire site
Knowing the
Customer
·Cookies
present a problem because many customers do not trust them
·Also,
when you give a user that shares a terminal a cookie, there is no way make
sure that when someone from that terminal visits your site, you are dealing
with the same user; i.e. cookies are machine specific, not user specific.
·The
workaround is secure logins
·OPS:
The Open Profiling Standard is a proposed standard for how Web users can
control the personal information they share with Web sites. OPS has a dual
purpose: (1) to allow Web sites to personalize their pages for the individual
user and (2) to allow users to control how much personal information they
want to share with a Web site. OPS was proposed to the Platform for Privacy
Preferences Project (P3P) of the World Wide Web Consortium (
W3C)
in 1997 by Netscape Communications (now part of America Online), Firefly
Network, and VeriSign.
(copied from
WhatIS.com; to read more about OPS go to:
http://whatis.techtarget.com/definition/0,289893,sid9_gci214208,00.html)
·Manage
customers not products. Stay attentive to the customers needs and try to
build a relationship with the actual customer.
·A
Microstrategy (http://www.microstrategy.com)
white paper states that 66% of all information traded between consumers
and businesses will be non-commercial
in nature. This means that by nurturing the relationship between the business
and the consumer (not always forcing products down their throat) businesses
will be able to excel.
Broadvision.com
·Broadvision
develops and delivers an integrated suite of packaged applications for
personalized enterprise portals. Global enterprises and government entities
use these applications to sell, buy, and exchange information over the
web and on wireless devices. The BroadVision e-commerce application suite
enables companies to become more competitive and profitable by establishing
and sustaining high-yield relationships with customers, suppliers, and
employees. (copied from http://www.broadvision.com)
·Software
allows for the intuitive
management and collection of data on customers on a given web site. Includes
real-time applications. Essentially makes relating data to other data much
easier for users of the system.
How does this
all work?
·Data
Mining applications use intuitive methods including decision trees, neural
networks and other permutations of business rules to target a specific
customer.
·Through
the definition of the business rules, a certain customer can be targeted
(i.e. 18 year old man with a fast computer for computer games, etc).Then,
specific marketing can be used to maximize the value of that customer to
the company.
Knowledge Discovery
in Databases (KDD)
·It
is the process of identifying valid, novel, potentially useful, and understandable
patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
·It
involves data preparation, pattern extraction, knowledge evaluation, and
refinement, in iteration (Narahari)
·This
is essentially a data mining process which results in business knowledge.
Specific business rules and algorithms are applied to extract data. However,
the data has to be significantly cleaned before it can actually be mined-
this is generally due to the very raw nature of the data collected. It
has to filtered to remove outliers and extraneous data, as well as to put
the data into a uniform manageable state before it can be mined.
·Process:
1.Select
Data
2.Data
Cleansing and Pre-processing (80-90% of time spent on this process)
3.Data
Mining
4.Results
interpretation
5.Implementation
Data Mining
Problems
·There
are inherent problems any time you segment a group. Thus, when one beings
creating numerous segments automatically electronically, there are bound
to be some segmentations made which should not have been. Essentially,
it is very difficult to fit everyone into certain categories. Some people
may fit better in more than one category, or even a category that does
not exist.
·The
collection of extraneous data can also lead to improper segment assignment.
For example, if someone went to buy diapers at the supermarket and lots
of other baby products because they had family visiting, but as a result
were now segmented as customer with an interest in baby products, the segmentation
would be faulty.
·Similarly,
some product associations are faulty. For example, the beer and diapers
example discussed in class. A man goes to the store to buy diapers and
decides to pick up a 6-pack. If this happens numerous times, then it would
show the vendor (at first glance) a strong trend toward a product association
between beer and diapers. However, common sense says this is coincidental
and these are not co-marketable products. Thus, not all relationships in
data always exist as they appear to.
·Further,
forecasting is troublesome as conditions can change very quickly. If data
is mined and shows that there is an increasing trend in the purchase of
suntan lotion that does not mean the trend will continue to grow through
the year. Controls must be placed on the process to let the system know
that the lotion is a season item.
Data Mining
Tasks
·Intelligent
data mining suites can make fair predictions. By mining historical data,
along with current data, it is possible to make reasonable forecasts as
to how many of a given product a vendor can expect to sell on a given day.
·Further
mining the data to determine what the drivers of purchase volume are can
be beneficial to companies looking to increase sales.
·Data
mining could be used to show trends as to which products have spikes in
demand together. Then, these products could be co-marketed to increase
overall sale volume for each item.
·Thus,
determining which products go together, both naturally and by consumer
demand, is an inherent ability of good data mining.
·The
process of grouping like things together is called affinity grouping.
It is directed by human involvement.
·There
is an inherent problem with affinity grouping, an example of which is shown
in the preceding example of grouping diapers and beer.
·Cluster
Grouping differs because it
is not a directed task. Clusters are formed of products and customers with
similar demand patterns.
·This
method of grouping is preferred by many experts.