As more and more companies have embraced the power of data science, integrating it into their production process becomes an inevitable challenge. Data science is a unique discipline, in that it exists between software development and research. Because of this, understanding the pros and cons of existing methodologies is crucial to the success of a product.

Nowadays, most companies treat a data science project as a two-step process: research, then engineering. That is, data scientists do the research and finalize the model, then hand over to software engineer/IT for ETL and model deployment. This is inefficient in many ways. For one, communication between teams can be time-intensive. In addition, engineers can take time to ramp up on the intricacies of a model built by another.

The preferable approach, when possible, is to integrate both steps of the two-step process into software development, increasing efficiency and improving the rate of success. In this post, I’ll first introduce an industry-standard data mining process, CRISP-DM, and follow with a derived approach, Pivotal’s agile data science process.

What is CRISP-DM?

CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It’s a conventional process that is used by data mining experts. The process has the following six phases:

Ting Shuo (Daniel) Wu

Data Scientist @ A42 Labs

Daniel is a Data Scientist with a strong mathematics background and experience in software development, machine learning, statistical modeling, and finance.

As a technology enthusiast, Daniel is passionate about solving problems and building data driven solutions that discover the hidden value of customer data. His professional experience includes: building machine learning & deep learning models, deploying end-to-end data science pipeline, and explaining data science to non-technical business audience, with the ability to create insights via data in cross-functional team environments.