Do you think using data mining as a black box tool is justifiable?

These users [marketers] either know nothing about the techniques of data mining or do not need to know anything about data mining to reap its benefits“. I remember following a doctoral course on data mining where the teacher, a professor in machine learning, was claiming that people should not use data mining as a black box tool. If this is the case, the so called garbage in, garbage out situation is likely to happen. Do you think using data mining as a black box tool is justifiable? What is your mind about that?

Comment of Erik:

Being one of the authors of the mentionned book, I thought I would try to expose the other side of the coin… it is no fun when everybody agrees on a blog ;-)
At the last KDD conference, there was a panel organized trying to understand why data mining is not a multi-billion dollars business as, for example, it is the case with Business Intelligence (dealing with reports and OLAP: what I call the ‘low end’ of analytics, or human powered analytics by opposition to the maths powered analytics).
It was striking to see that almost all data mining experts that were present (and there is a lot in KDD) were claiming that you do need statisticians or data miners to do data mining, and KXEN representative (Rob Cooley) was almost the ony one claiming that automated data mining is possible. So, the point of view of the people participating to this blog is ‘mainstream’… But, progress has always be made because, one day, one guy stands up and say: “Wait a minute, is this really true?”
I agree with all that has been said on problems linked with missing values, outliers, good performance indicators, overfitting and underfitting, imbalanced classes, model validation, variable selection, leak variable detection, variable encoding, curse of dimensionality, deviation detection when applying a model, and descriptive power (ouch… This list is a good start for your other blog topic).
But, I do not agree when people (experts) are claiming that all these topics cannot be solved with automated processes providing good solutions in 95% of the cases, because that is what we (KXEN) have done (for each of the topics mentionned above). And the solution is very simple: even the experts use books, articles, and techniques that they have been trained on, I do not see any reason why a software could not use the same techniques in an automated way…
The real question is: ‘Is data mining technology mature enough in 2006 to solve automatically 95% of the business situations?’ My answer is clearly yes.
And this is linked with: where is data mining used today? I have read the example of the meteorologist, and I was thinking, for each meteo mathematical model, there must be 1 million models developped on Earth to detect “what customers will buy next month?”, “will my customer leave in the next 3 months?”, “how many customers will not reimbourse their credit”, “is this credit card transaction fraudulent?”, “how many products will I sell next week?”. If you count the number of mathematical models produced per year on our small planet, the very vast majority are in: CRM, Risk, Quality, and any kind of forecasts/predictions. There is a growing concern in the bio sphere, but, in terms of masses, it is nothing compared with CRM (a single KXEN customer generates thousands of mathematical models per year). For this, see http://www.kdnuggets.com/polls/2005/successful_data_mining_applications.htm
So the sentence of the book relates to the fact that, yes, we can fully automate Marketing Campaign Optimization; We can fully automate computation of Credit Risk probability of defect. Wherever we can translate a business problem in a suite of data mining tasks, we can automate now, today… with results comparable to a very good expert.

Be careful, nobody claims that, if you put very bright people for one year on the same problem, they will not get better results. Of course they will! But there is not enough experts on Earth for all the problems that can be optimized today using state of the art automated data mining techniques…

Now, this said, there are phases which will always be ‘human powered’ (I am waiting for a guy to stand up and say: “Wait a minute…”):
1/ How to go from a (business) problem definition to a decomposition of data mining tasks using data mining functions (I am NOT talking about algorithms here, but functions as defined in JDM).
2/ How to perform a business validation of the findings of the maths powered engines. This is why all useful data mining function implementations must be verbose and ’speak’ to business users in a way they can understand it.
3/ How to access to the data in a way a normal person (and not a database adminstrator) would like to. This is where collaboration with BI solutions is very interesting.
4/ How to discover new algorithms and techniques that will work in 97% of the cases…

OK. I am finihsed. I hope that I have managed to convey the passion of the other side of the force…

P.S. By the way, besides this one sentence, how was the book?? ;-)