top of page
Search
Writer's pictureAjay Sharma

What Is CRISP – DM Methodology?

CRISP-DM stands for Cross Industry Standard Process for Data Mining. The CRISP-DM methodology is practical, flexible, and useful when solving business issues with analytics.

The definition of CRISP-DM is a data mining technology or a methodology or a process that helps you or provides you a blueprint to conduct a data mining project. It was implemented in 1996 and was founded by major companies like Daimler Benz, ISL, NCR & OHRA. These companies have actually implemented in around 200 data mining users and tools and then they came up with this model. This is a nonproprietary documented and freely available process that’s what the actually designed, so everybody can use it.


How does it help?

CRISP – DM provides a roadmap, it gives you best practices and it provides you structures for better and faster results of using data mining, so that’s how it helps the business to follow while planning and carrying out a data mining project.



Business Understanding

Business Understanding is the first phase where we convert a business objective or we understand the project from a business perspective and then we convert it to data mining sub tastes, so we convert a business objective into a data mining objective or a data mining tasks where we can apply technologies for modeling technologies into it. Four major tasks to be focused on business understanding: 1. Determine business objective – Here we actually focus and understand what is the true goal of your project and what are some of the important factors that we need to know about the business. 2. Assess the situation – Here we list out what are the assumptions that we need to make, what are the cost-benefit analysis that we need to do. 3. Determine data mining goals – Here we set objectives for the team or the business. 4. Provide a proper project – Here we provide a project plan and we set specific outlines and also propose a timeline and you see these are all the tools and techniques that we are going to use.

Data Understanding Data Understanding is the second phase which starts with the initial collection of data, where we increase the familiarity with the data and we also have to create a hypothesis based on the data quality and the data we already have, if we have any interesting data sets we can provide an initial hypothesis with the hidden information that we have collected. Four major tasks to be focused in data Understanding: 1. Collecting the data – Data collection is where we collect and acquire the data and when we find there is any problem that you have encountered you have to make note of it. 2. Describing the data: Describing the data is where we actually examine the surface of the data and if we see any problems that we have during acquiring the data and then we also have an option to see what are the formats that we can set and how much quality and quantity that you have, also you can set records and fields in tablets and all this we can do in the description of data. 3. Exploring the data – Data exploration is where we create a data exploration report and then what all are our first findings or our initial hypothesis that we have and then we give it as exploration report. 4. Data Quality – This is the significant task, here we find the missing attributes and then we see if there are any blank fields or if you see any spellings mistakes of the values, we just make a note of the quality of the data that you have, also if you see any conflicts in the data you can mention that as well.

Data Preparation Data Preparation is the third phase where we have the data, we have acquired the data, we have the quality so nowhere in data preparation we set the final data set and we will be using this data set for the modeling which is the next phase. So to give a definition, it's all about collecting all the data and setting the final data set, and that will be fed into the modeling tools that we are going to use in the next phase. Some straightforward actions that we have to do are: 1. Select – Decide what data we are going to use 2. Clean – Here we go to the data quality and see if there are any missing attributes or any spelling mistakes, so we clean the data and have the correct verified data. 3. Construct – Here we actually develop new records or we describe new attributes that we want to create. 4. Integrate – Here we combine multiple records and tables altogether and integrate and aggregate the data. 5. Format – Here we remove some illegal characters we find or if you want to trim the values as per your model, so all this is done in formatting the data.

Modeling Modeling is where we actually propose various model techniques and select and apply them and see if we can apply that and what are the options that we have. Four major tasks to be focused in Modeling: 1. Select the model 2. Test the model 3. Create the model 4. Assess the model

Evaluation In evaluation, we actually create and work with our business objectives and then we come up with evaluation sheets and then we come up with process reviewing and then we see if there is anything that we have to determine for the next steps, so here we actually summarize the whole result and then we give it as a business criterion, that is what we do in evaluation.

Deployment Here in the final 6th phase we actually deploy, deploying is where we present the report or decide to carry the project to the next level or we carry it to the business steps. Some major tasks are: 1. Plan deployment 2. Plan monitoring 3. Plan final report 4. Review project So, here in this article, we saw the process of CRISP-DM and how it works. Further, we would discuss CRISP in the upcoming articles.

0 views0 comments

Recent Posts

See All

Python MongoDB - Insert Document

In this article, we will try to see how we can insert documents into MongoDB using python. You can store documents into MongoDB using the...

Comments


Post: Blog2 Post
bottom of page