Manager - ML Delivery & Data Operations

Mechademy

Mechademy

Software Engineering, Operations, Data Science
Gurugram, Haryana, India
Posted on Feb 17, 2026
About the job

Mechademy's ML operations are at an inflection point. We've built internal tools that can create production-grade machine learning models in under 30 minutes — even for years of industrial sensor data. The delivery engine exists. Now we need someone to run the factory.

We're looking for a Manager — ML Delivery & Data Operations with 5-7+ years of experience to own our ML model lifecycle, client data onboarding, and operational excellence. You'll work directly with the Director of Data Science to free him from 20-25 hours/week of operational work while scaling ML model production to 30+ models daily by 2027.\

Our clients include Berkshire Hathaway, Chevron, SM Energy, and Freeport LNG. When we say "zero-defect execution," we mean it — mistakes aren't an option when you're monitoring billion-dollar industrial assets.

This role is 50% operations management and 50% hands-on execution initially, shifting to 70% management as the team scales.

Key Responsibilities

Operations Management & Process Excellence (30%)

  • Triage incoming requests (model creation, data onboarding, ad-hoc analyses) and distribute work across the team
  • Establish SLAs for ML operations and data operations — define what "good" looks like and hold the team to it
  • Build processes, SOPs, and automation to reduce the current 80% manual operational burden by 40%+
  • Capacity planning: scale from current pace to 30+ models daily by 2027
  • Identify operational bottlenecks and implement systematic solutions
  • Free the Director from operational firefighting, enabling 90% strategic focus

ML Model Lifecycle Management (25%)

  • Use our internal AutoML tools to create regression models for clients (training takes <30 min per model)
  • Validate model quality — you need to understand feature engineering, feature selection, evaluation metrics (not just accuracy — residuals, drift, business-relevant metrics), and know whether a model is good enough to ship to Chevron
  • Deploy models to production environments and monitor for drift and degradation
  • Manage model retraining schedules and lifecycle
  • Build automation for model monitoring (currently manual scripts)
  • Transition from 80% manual ML ops to automated, scalable processes

Client Data Onboarding & Quality Assurance (25%)

  • Lead client dataset onboarding from raw IoT sensor data to ML-ready state
  • Prepare data for ML model training using our AutoML platform
  • Write and optimize SQL queries to inspect, transform, and validate client data
  • Implement rigorous DQA workflows: type checks, missingness detection, outlier flagging, reconciliation
  • Partner with Customer Success, Product, and Engineering to resolve data blockers
  • Ensure zero defects in client data entering ML pipelines

Team Leadership & Hiring (20%)

  • Directly manage 2-3 people initially, grow team to 6-7 over 12-18 months
  • Conduct weekly 1:1s, performance reviews, career development planning
  • Hire and onboard ML/Data Ops Specialists with Director approval
  • Create SOPs, training materials, and knowledge transfer processes
  • Foster culture of rigor, craftsmanship, and zero-defect execution
Required Qualifications
  • 5-7+ years in data operations, analytics delivery, ML operations, analytics engineering, or similar operational roles
  • 2+ years with direct team management responsibility (not just tech lead)
  • Strong proficiency in Python (Pandas, NumPy, Polars); production-quality code, not just notebooks
  • Write optimized SQL queries for large datasets; query tuning, window functions, CTEs
  • Solid understanding of ML concepts — you should know what feature engineering and feature selection are, why models are created, how to evaluate whether a model is performing well, and what deployment means in practice. You don't need to design algorithms, but you need to look at a model's output and know whether it's good enough to ship.
  • Data validation, cleaning, anomaly detection, automated DQ workflows
  • Scripting for process automation, scheduling, orchestration
  • Demonstrated track record of building processes from scratch: SOPs, automation, SLAs, capacity planning
  • Process-driven mindset: you see a manual process and instinctively ask "how do I automate this?"
  • Comfortable starting hands-on and evolving to management as the team scales
Preferred Qualifications
  • Experience with AutoML platforms or ML lifecycle management tools (MLflow, Ray, Kubeflow)
  • Experience with orchestration tools: Airflow, Prefect, or Dagster
  • Track record of reducing manual operational burden by 40%+ through automation
  • Experience scaling operations from low volume to high volume (10x+ growth)
  • Client data onboarding experience — working with messy, real-world external data
  • ML frameworks awareness (scikit-learn, XGBoost)
  • Statistical methods for outlier detection
  • Startup or high-growth environment experience
Technologies You'll Work With
  • Languages: Python, SQL
  • ML Operations: AutoML platforms, model deployment, monitoring, drift detection
  • Data Tools: Pandas, NumPy, Polars, SQL databases
  • Automation: Scripting, scheduling, orchestration workflows
  • Process Tools: Git, Jupyter, SOPs, documentation
  • Cloud Platforms: AWS (S3, data storage)
  • Nice-to-Have: MLflow, Ray, Dagster, Airflow, Apache Iceberg
Qualifications
  • Bachelor's degree in Engineering, Computer Science, Mathematics, Statistics, Data Science, or equivalent
Bonus Points
  • Experience scaling ML production from low volume to high volume (10x+ growth)
  • Familiarity with industrial IoT, sensor data, or time-series data
  • Experience managing both data engineering and ML operations teams
  • Client data onboarding from external/enterprise sources (not just internal datasets)
  • Track record building operational automation that reduces manual work 40%+
  • Hands-on experience with distributed ML systems (Ray, Spark)
What Success Looks Like

First 30 Days: Shadow current workflows, map every operational task the Director handles, begin handling daily triage independently.

First 90 Days: Fully own 100% of operational workload. Director's operational time drops from 40% to <15%. Establish SLAs and tracking for all requests.

First 6 Months: Operational manual burden reduced by 40%+. Team scaled to 4-5 with clear SOPs for every core workflow. ML model production visibly on trajectory toward 30+ daily.