← Back to all products

Data Migration Factory

$79

Industrialized migration framework with extractors for 8 source types (SQL Server, PostgreSQL, Oracle, MySQL, files, APIs), validation, and cutover runbooks.

📁 16 files🏷 v1.0.0
PythonSQLMarkdownJSONDatabricksRedisPostgreSQL

📁 File Structure 16 files

data-migration-factory/ ├── README.md ├── assessment/ │ ├── migration_wave_planner.py │ └── source_assessment.py ├── dashboards/ │ └── migration_progress.sql ├── extractors/ │ ├── file_bulk_extractor.py │ ├── mysql_extractor.py │ ├── oracle_extractor.py │ ├── postgresql_extractor.py │ ├── rest_api_extractor.py │ └── sql_server_extractor.py ├── guides/ │ └── migration_methodology.md ├── runbooks/ │ ├── cutover_runbook.md │ └── decommissioning_checklist.md └── validation/ ├── data_sampling.py └── reconciliation.py

📖 Documentation Preview README excerpt

Data Migration Factory

Product 15 | Datanest Digital

[https://datanest.dev](https://datanest.dev)

Price: $79 | Version: 1.0.0 | Category: Data Engineering

---

Overview

Data Migration Factory is a production-ready framework for migrating data from

heterogeneous source systems into Databricks Lakehouse. It provides structured

assessment tooling, battle-tested extractors for the most common source platforms,

automated reconciliation, and operational runbooks that reduce migration risk and

compress project timelines.

Whether you are consolidating on-premise databases, moving from legacy warehouses,

or ingesting from SaaS APIs, this toolkit gives you repeatable, auditable migration

pipelines from day one.

What's Included

Assessment

| File | Description |

|------|-------------|

| assessment/source_assessment.py | Databricks notebook that catalogs source tables, columns, volumes, row counts, and dependency graphs. |

| assessment/migration_wave_planner.py | CLI tool that scores tables by complexity and groups them into prioritized migration waves. |

Extractors

| File | Description |

|------|-------------|

| extractors/sql_server_extractor.py | SQL Server full + incremental extraction via Change Tracking / CDC. |

| extractors/postgresql_extractor.py | PostgreSQL logical replication extractor with slot management. |

| extractors/mysql_extractor.py | MySQL binlog-based CDC extractor. |

| extractors/oracle_extractor.py | Oracle LogMiner-based CDC extractor. |

| extractors/file_bulk_extractor.py | Flat file (CSV / JSON / Parquet) bulk and incremental loader. |

| extractors/rest_api_extractor.py | REST API pagination framework with retry, backoff, and schema inference. |

Validation

| File | Description |

|------|-------------|

| validation/reconciliation.py | Source-to-target row count, hash, and aggregate reconciliation. |

| validation/data_sampling.py | Statistical sampling comparison with configurable thresholds. |

Dashboards

| File | Description |

|------|-------------|

| dashboards/migration_progress.sql | SQL-based dashboard queries for tracking migration waves, throughput, and quality. |

Runbooks & Guides

| File | Description |

|------|-------------|

| runbooks/cutover_runbook.md | Step-by-step cutover procedures: parallel run, validation gates, switchover, rollback. |

| runbooks/decommissioning_checklist.md | Post-migration source decommissioning checklist. |

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

assessment/migration_wave_planner.py #!/usr/bin/env python3 """ Migration Wave Planner ====================== Data Migration Factory | Datanest Digital | https://datanest.dev CLI tool that reads the source assessment catalog (from ``source_assessment.py``) and produces prioritised migration waves. Tables are scored by complexity, volume, dependency depth, and incremental-readiness, then grouped into waves respecting dependency order. Usage ----- # Against a Unity Catalog assessment table python migration_wave_planner.py \\ --catalog migration --schema assessment \\ --max-tables-per-wave 20 \\ --max-rows-per-wave 500000000 \\ --output waves.json # Against a local JSON export of the assessment table python migration_wave_planner.py \\ --input-file assessment_export.json \\ --max-tables-per-wave 15 \\ --output waves.json """ from __future__ import annotations import argparse import json import logging import sys from collections import defaultdict, deque from dataclasses import dataclass, field, asdict from pathlib import Path from typing import Optional logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s - %(message)s", ) logger = logging.getLogger("datanest.wave_planner") # --------------------------------------------------------------------------- # Data models # --------------------------------------------------------------------------- @dataclass # ... 351 more lines ...