Definitions


# CRISP-DM

Cross-Industry Standard Process for Data Mining

Is a standardized approach to data mining projects. CRISP-DM breaks down Data Mining in 6 easy steps.


# Steps

  1. Business Understanding
    1. understand business requirements & goals
    2. clarify how dm-process can bring value
  2. Data Understanding
    1. understand quality & relevance of data
  3. Data Preparation
    1. prepare data for analysis (choose & merge)
    2. data cleaning, transformation, integration, feature engenieering
  4. Modeling
    1. select data to build models
    2. identify patterns & trends in data
  5. Evaluation
    1. verify models meet business objectives
  6. Deployment
    1. integrate models into business processes

# Roles & Definitions

  1. Data Mining
    1. discover patterns, correlations from large datasets
  2. Data Science
    1. scientific methods, algorithms, systems to extract knowledge & insights from data
    2. statistics, ML, DM, visualization to analyze complex problems
    3. work in all 6 fields of CRISP
  3. Explorative Dataanalysis (EDA)
    1. analyze datasets to summarize their main characteristics
    2. understanding data’s structure, patterns, relationships
  4. Data Engineering
  5. Data Analyst
  6. Data Engineer
  7. Data Scientist
  8. DevOps/MLOps/DataOps