MLOps Engineer

Fulltime
Hyderabad
Posted 1 month ago

A Brief about the company:

TransGraph Consulting Pvt. Ltd. provides price forecasting and risk management solutions to various companies in manufacturing, trading, and refining and FMCG industry segments. TransGraph is focused on providing services to players who deal with commodities physically and having exposure to derivatives as part of its hedging activities. TransGraph’s deliverables span from research reports, hedge modeling, drafting of risk management policy, customized procurement/trading/ hedging strategies with support from focused and knowledgeable engagement managers to powerful risk management software product-TransRisk.

TransRisk is the risk management software product. It is a VaR-based Risk management system and is a brilliant mix of exposures, profitability and risk analytics for prudent trade, procurement, pricing and hedging decisions of the clients. It understands the position and price data to enable decision making perspective in the risk.

Current Requirement:

We are looking for MLOps Engineers who will own, develop, deploy, and operate tooling and services around MLOps. The candidate should be able to manage performant model training and tracking, ensure safe, stable, and scalable machine learning model deployments in both real-time and batch flows, with careful consideration of latency, reliability, and scalability. The role involves experiment tracking, validation, and hyperparameter optimization runs, along with model monitoring for downtime, latency, and drift detection. The candidate will be responsible for ensuring the scalability of the MLOps infrastructure and advancing its maturity to the next level.

Additionally, they must build tools to democratize machine learning practices within TransGraph and work closely with product machine learning teams to understand their workflows, identify pain points, and provide effective solutions.

Job Specifications:

  • Department / Team: Information Technology
  • Work Location: Hyderabad (Work from Office)
  • Education: Bachelor’s or Master’s degree in computer science or a related field; specialization in AI/ML will be an added advantage.

Required Skills

Technical Skills

  • Programming & Software Engineering: Python (Advanced), Git/GitLab, CI/CD, clean coding, testing, debugging, API development (Fast API, Flask, Django, REST, gRPC).
  • Machine Learning & AI: End-to-end ML lifecycle, model training & deployment, ML algorithms, AI Agents & Chatbots (Lang Chain, Rasa), model diagnostics & optimization.
  • MLOps & Data Engineering: MLflow, Kubeflow, DVC, W&B, Airflow, Prefect, PySpark/Spark, Pandas, Jenkins/GitLab CI/CD, Docker, Kubernetes, TensorFlow Serving.
  • Cloud & Infrastructure: AWS/GCP/Azure, Docker, Kubernetes, Helm, Terraform, Ansible, monitoring & logging (Prometheus, Grafana, ELK/EFK).
  • Area of Exposure:

    1. Candidate with 3+ years across DevOps, MLOps, Machine Learning, and Data Engineering roles with strong exposure to production-grade systems.
  • Incumbent must be skilled in Python, Git/GitLab, debugging, testing (unit/integration), and applying clean coding practices to build scalable solutions.
  • Hands-on with the complete Machine Learning lifecycle — data preparation, model development, deployment, monitoring — with solid understanding of ML algorithms and
    dependencies.

  • Experienced in designing and implementing MLOps Expertise architectures using MLflow, Kubeflow, Airflow, PySpark, and related frameworks for reproducibility and automation.
  • Built and deployed AI Agents and conversational Chatbots leveraging modern frameworks and architectures.
  • Proficient in diagnosing, deployment & Optimization and resolving issues around model performance, scalability, and deployment in production environments.
  • Familiar with AWS/GCP/Azure, containerization (Docker, Kubernetes), CI/CD, and infrastructure automation.
  • Strong communicator and team player with an experimental, iterative approach — driving best practices and cross-functional collaboration.
    • Additional Skills:

      1. Proficiency with observability and monitoring tools: Prometheus, Grafana, Kibana, and Elasticsearch (ELK/EFK stack).
      2. Experience with platform engineering and Kubernetes (K8s), Helm, and ArgoCD.
      3. Knowledge of Apache Kafka or other streaming frameworks.
      4. Exposure to frontend technologies and Java development.
      5. Experience with feature stores for ML workflows.
      6. Familiarity with GPU setup and management for accelerated deep learning workloads.

Job Features

Job CategoryIT

Apply For This Job

A valid phone number is required.