Polymerize Logo
Data Management

Six Steps to Successfully Execute an MI Project - Moving Beyond “Just Try Some Analysis” — A Practical Approach to Materials R&D DX

March 16, 2026
[object Object]

As MI tools become more accessible, it’s time to move beyond “just running analyses.” We explore CRISP-DM—the global standard for data analytics—for MI projects. What are the six essential steps to drive consistent results?

Introduction

As the adoption of Materials Informatics (MI) continues to grow, have you encountered challenges like these?
  • “We built a model using our data, but don’t know how to connect it to actual product development.”
  • “We obtained results, but they don’t align with practical experience, and the project stalled.”
In recent years, MI tools have become more accessible, significantly lowering the barrier to entry.
However, ease of access does not necessarily mean ease of effective use.
Now that tools are widely available, it is no longer enough to simply build models.
What matters is what to do after building them, and even more importantly, why they are built in the first place.
This requires designing the overall framework of the project—what we call Materials R&D DX (Digital Transformation).
In this article, we introduce a structured approach based on CRISP-DM, the global standard process for data analytics, adapted specifically for MI projects.
These steps will help organizations move from “just trying things out” to consistently delivering results at scale.

What is CRISP-DM?

CRISP-DM is a data analytics framework consisting of six iterative steps.
It is not a linear process, but a cycle—moving back and forth between steps to improve outcomes.
Running this cycle itself is what enables organizations to embed a culture of data-driven R&D—
in other words, to practice Materials R&D DX.
Let’s walk through the six steps.

1. Business Understanding

Clarify:
  • What kind of materials do we want to develop?
  • What problems are we trying to solve?
Key point:
The objective does not need to be perfect from the beginning.
Even a simple goal like “Let’s see if we can predict this property” is sufficient.
Why it matters:
Clear objectives make it much easier to evaluate model performance later.
This step answers the fundamental question:
👉 “What is the purpose of DX?”

2. Data Understanding

Take inventory of your available data:
  • Experimental notebooks
  • Excel files on personal PCs
  • Historical reports in shared drives
Assess data potential:
  • Can this data be used for machine learning?
  • Is the volume sufficient?
Understanding both data quality and quantity at a high level is critical.

3. Data Preparation

This is often the most time-consuming—and most important—step.
  • Consolidate scattered data into a unified format
  • Standardize naming conventions
  • Handle missing values
  • Convert into machine-learning-ready structured data
DX perspective:
This is not just preprocessing.
It is the process of transforming fragmented, individual data into organizational assets.
Role of platforms:
MI platforms don’t magically clean data, but they help standardize formats and reduce inconsistencies, making data structuring much more efficient.

4. Modeling

Build machine learning models.
Start simple:
There is no need to begin with complex algorithms.
Start by creating a baseline model.
By actually running models, you gain insights such as:
  • “We need more data here”
  • “This might be more predictable than expected”

5. Evaluation

Evaluate the model not only by accuracy, but also by usability and interpretability.

Forward Prediction

Can the model reasonably predict properties under new conditions?

Inverse Design

Can the model propose compositions or process conditions to achieve target properties?

Interpretability (Domain Alignment)

Use techniques like SHAP or feature importance to understand:
  • Which factors the model considers important
  • Whether they align with domain knowledge
If aligned → trust increases
If not → potential opportunity:
  • Data bias?
  • Hidden correlations?
  • New scientific insights?
These insights often lead to breakthroughs in R&D.

Improving Performance

If results are insufficient:
  • Should we collect more data?
  • Engineer new features?
  • Add metadata (e.g., raw materials, SMILES)?

6. Deployment

Integrate the model into actual R&D workflows.
This is the true goal of Materials R&D DX.

In Practice

  • Use forward prediction to reduce experiments
  • Use inverse design to discover new formulations
  • Treat AI as a partner, not a replacement

Continuous Cycle

Feed new experimental data back into the system
→ retrain models
→ improve performance

Conclusion

“Start small, iterate fast”

At first glance, these six steps may seem complex.
However, the reality is the opposite.
Start by quickly running through steps 3–5 using your existing data.
This will naturally reveal:
  • What data is missing
  • How the problem should be redefined

How Platforms Accelerate the Cycle

Polymerize’s Materials R&D DX platform is designed to accelerate this cycle:
  • Data Preparation: Standardization and centralization of data
  • Modeling: Automated modeling without deep expertise
  • Deployment: Built-in forward and inverse analysis
Whether you want to:
  • “Just try MI”
  • or “Build a full-scale system”
the platform enables you to move forward without losing direction.

Get Started

Why not try your first cycle using your own data?
Start with a free trial and experience the value firsthand.
[object Object]

Masahiro Fujita

Technical Customer Success
Community Engagement

Join the Community

Connect, collaborate, and create with the our community. Become a member today and be part of the future of material innovation.
LinkedIn
Network and discover opportunities.
X.com
Follow for updates and insights.
Polymerize Logo
Stay Informed with Our NewsletterSign up to receive regular updates on platform enhancements, and industry news.
By subscribing, you agree to our Terms and Conditions.
© Polymerize