Crisp-DM: The Standard

CRISP-DM is the de facto standard for developing Data Mining (DM) & Knowledge Discovery (KD) projects and is thus also the most used methodology for these specific projects.
It arose after a group of prominent enterprises (Teradata, SPSS, …) analyzed the problems and obstructions that occurred during DM & KD projects. Subsequently, they proposed a reference guide to develop projects of this nature which then became CRISP-DM (CRoss Industry Standard Process for Data Mining). It is vendor-independent making it applicable to solve any DM related problem.


Six phases

CRISP-DM defines six phases that need to be carried out during a Data Mining project.


Business understanding: Understanding the project objectives & requirements from a business perspective and converting this knowledge into a DM problem definition and a preliminary plan to achieve these objectives.

Data understanding: discover first insights and detect interesting subsets to form hypotheses for hidden information.

Data preparation: Transform your data into a usable form. Contains all the activities required to construct the final dataset from the initial raw data. If you proceed to the next phase without proper data preparation, your results will never attain the aspired results (garbage in, garbage out).

Modeling: Select and apply various modeling techniques on your data set. During this phase, you usually take a step back to the data preparation phase because some techniques have specific requirements on the form of data.

Evaluation: Evaluate the results of your model thoroughly and review the steps taken to build it to be certain that it properly achieves the business objectives which you defined in phase one.

Deployment: Deploy the model effectively, automate it, plug it into business processes and discuss it with the people that will be using it.


Room for improvement

However, the CRISP-DM model still has room for improvement. Other models based on CRISP-DM propose alternative/additional phases like the Automate phase which focuses on generating a tool to help non-experts in the area to perform Data Mining & Knowledge Discovery tasks.

Another example of a phase that is not covered by CRISP-DM is the On-going support phase. It is very important to take this phase into account, as DM & KD projects require a support and maintenance phase. Maintenance can range from creating and maintaining backups of the data used in the project to the regular reconstruction of DM models. This is because the DM models may change whenever new data emerges, which may in turn cause them to be less applicable.

Nonetheless, changes like these (e.g. adding, renaming or eliminating phases) are being considered for the new version (CRISP-DM 2.0).



After comparing this process model to others (especially Software Engineering process models) the conclusion can be made that CRISP-DM does not cover many project management-, organization and quality-related tasks at all or at least not thoroughly enough. In the present day, this has become a must due to how complex projects have become.
Data Mining projects have become more, as they now not only encounter huge streams of data but also require managing and organizing big collaborating teams.

It remains to be seen if a DM engineering process model can be put together that covers the obstacles mentioned above in combination with CRISP-DM in order to adapt it to the most recent DM and KD processes.


Lode Wouters

15 januari 2020

Check out our lastest news posts

Digital transformation: Bimodel IT?

Bimodel IT is a two-tiered IT operations model which was introduced by Gartner around 2014. It defines the two tiers as “Mode 1, traditional and sequential, emphasizing safety and accuracy, also referred to as exploitation. Mode 2 is exploratory, nonlinear, emphasizing agility and speed. Each mode will require a different management approach. Processes, organizational structures […]

23 december 2021

Read more

Konato team weekend 2021

Ons teamweekend was er weer boenk op! We liepen over vuur, we vochten tegen zombies in een virtuele wereld en we zwierven ‘s nachts door de bossen van Nadrin. Avontuur en plezier in het kwadraat. Collega’s, jullie waren de max! 

4 oktober 2021

Read more

EHB gastcollege

Afgelopen vrijdag gaven we een gastcollege aan de Erasmus hogeschool Brussel. Het was een boeiende en leerrijke wisselwerking over hoe een project op te starten, user story mapping, MVP’s , Miro en Atlassian … een hele brok knowledge sharing, waar theorie en praktijk elkaar ontmoeten. Dank aan onze collega’s Nico & Joren en dank aan […]

23 maart 2021

Read more