Mechademy's ML operations are at an inflection point. We've built internal tools that can create production-grade machine learning models in under 30 minutes — even for years of industrial sensor data. The delivery engine exists. Now we need someone to run the factory.

We're looking for a Manager — ML Delivery & Data Operations with 5-7+ years of experience to own our ML model lifecycle, client data onboarding, and operational excellence. You'll work directly with the Director of Data Science to free him from 20-25 hours/week of operational work while scaling ML model production to 30+ models daily by 2027.\

Our clients include Berkshire Hathaway, Chevron, SM Energy, and Freeport LNG. When we say "zero-defect execution," we mean it — mistakes aren't an option when you're monitoring billion-dollar industrial assets.

This role is 50% operations management and 50% hands-on execution initially, shifting to 70% management as the team scales.

Key Responsibilities

Operations Management & Process Excellence (30%)

Triage incoming requests (model creation, data onboarding, ad-hoc analyses) and distribute work across the team
Establish SLAs for ML operations and data operations — define what "good" looks like and hold the team to it
Build processes, SOPs, and automation to reduce the current 80% manual operational burden by 40%+
Capacity planning: scale from current pace to 30+ models daily by 2027
Identify operational bottlenecks and implement systematic solutions
Free the Director from operational firefighting, enabling 90% strategic focus

ML Model Lifecycle Management (25%)

Use our internal AutoML tools to create regression models for clients (training takes <30 min per model)
Validate model quality — you need to understand feature engineering, feature selection, evaluation metrics (not just accuracy — residuals, drift, business-relevant metrics), and know whether a model is good enough to ship to Chevron
Deploy models to production environments and monitor for drift and degradation
Manage model retraining schedules and lifecycle
Build automation for model monitoring (currently manual scripts)
Transition from 80% manual ML ops to automated, scalable processes

Client Data Onboarding & Quality Assurance (25%)

Lead client dataset onboarding from raw IoT sensor data to ML-ready state
Prepare data for ML model training using our AutoML platform
Write and optimize SQL queries to inspect, transform, and validate client data
Implement rigorous DQA workflows: type checks, missingness detection, outlier flagging, reconciliation
Partner with Customer Success, Product, and Engineering to resolve data blockers
Ensure zero defects in client data entering ML pipelines

Team Leadership & Hiring (20%)

Directly manage 2-3 people initially, grow team to 6-7 over 12-18 months
Conduct weekly 1:1s, performance reviews, career development planning
Hire and onboard ML/Data Ops Specialists with Director approval
Create SOPs, training materials, and knowledge transfer processes
Foster culture of rigor, craftsmanship, and zero-defect execution

Required Qualifications

5-7+ years in data operations, analytics delivery, ML operations, analytics engineering, or similar operational roles
2+ years with direct team management responsibility (not just tech lead)
Strong proficiency in Python (Pandas, NumPy, Polars); production-quality code, not just notebooks
Write optimized SQL queries for large datasets; query tuning, window functions, CTEs
Solid understanding of ML concepts — you should know what feature engineering and feature selection are, why models are created, how to evaluate whether a model is performing well, and what deployment means in practice. You don't need to design algorithms, but you need to look at a model's output and know whether it's good enough to ship.
Data validation, cleaning, anomaly detection, automated DQ workflows
Scripting for process automation, scheduling, orchestration
Demonstrated track record of building processes from scratch: SOPs, automation, SLAs, capacity planning
Process-driven mindset: you see a manual process and instinctively ask "how do I automate this?"
Comfortable starting hands-on and evolving to management as the team scales

Preferred Qualifications

Experience with AutoML platforms or ML lifecycle management tools (MLflow, Ray, Kubeflow)
Experience with orchestration tools: Airflow, Prefect, or Dagster
Track record of reducing manual operational burden by 40%+ through automation
Experience scaling operations from low volume to high volume (10x+ growth)
Client data onboarding experience — working with messy, real-world external data
ML frameworks awareness (scikit-learn, XGBoost)
Statistical methods for outlier detection
Startup or high-growth environment experience

Technologies You'll Work With

Languages: Python, SQL
ML Operations: AutoML platforms, model deployment, monitoring, drift detection
Data Tools: Pandas, NumPy, Polars, SQL databases
Automation: Scripting, scheduling, orchestration workflows
Process Tools: Git, Jupyter, SOPs, documentation
Cloud Platforms: AWS (S3, data storage)
Nice-to-Have: MLflow, Ray, Dagster, Airflow, Apache Iceberg

Qualifications

Bachelor's degree in Engineering, Computer Science, Mathematics, Statistics, Data Science, or equivalent

Bonus Points

Experience scaling ML production from low volume to high volume (10x+ growth)
Familiarity with industrial IoT, sensor data, or time-series data
Experience managing both data engineering and ML operations teams
Client data onboarding from external/enterprise sources (not just internal datasets)
Track record building operational automation that reduces manual work 40%+
Hands-on experience with distributed ML systems (Ray, Spark)

What Success Looks Like

First 30 Days: Shadow current workflows, map every operational task the Director handles, begin handling daily triage independently.

First 90 Days: Fully own 100% of operational workload. Director's operational time drops from 40% to <15%. Establish SLAs and tracking for all requests.

First 6 Months: Operational manual burden reduced by 40%+. Team scaled to 4-5 with clear SOPs for every core workflow. ML model production visibly on trajectory toward 30+ daily.

Apply now

See more open positions at Mechademy

Privacy policy Cookie policy