Senior ML Engineer - Machine Learning Platform

Full-time

Senior-level

Cairo - Egypt

Job Description

A US-based leading provider of construction management software is looking for Software Engineering Managers to join their team. They provide cloud-based construction management software that helps clients efficiently build skyscrapers, hospitals, retail centers, airports, housing complexes, and more

What if you could use your technology skills to develop a product that impacts the way communities’ hospitals, homes, sports stadiums, and schools are built across the world? Construction impacts the lives of nearly everyone in the world, and yet it’s also one of the world’s least digitized industries, not to mention one of the most dangerous. That’s why we’re looking for a talented Senior ML Engineer to join our journey to revolutionize a historically underserved industry.

As a Senior ML Engineer on our ML Platform team, you will help evolve our Machine Learning platform to support hundreds of models. In this role, you will implement the set of services needed to release AI and data science models capable of working with TB of data. This includes model-related features like one-time and ongoing automatic model training, deploying, and monitoring models, as well as platform-related features such as model repository, feature stores, and data access layer.

This position will report to the Engineering Manager, ML platform.

Responsibilities

  • Checking deployment pipelines for ML models.
  • Review Code changes and pull requests from the data science team.
  • Triggers CI/CD pipelines after code approvals.
  • Monitors pipelines and ensures all tests pass and model artifacts are generated/stored correctly.
  • Deploys updated models to prod after pipeline completion.
  • Works closely with the software engineering and DevOps team to ensure smooth integration.
  • Containerize models using Docker and deploy on cloud platforms (like AWS/GCP/Azure).
  • Set up monitoring tools to track various metrics like response time, error rates, and resource utilization.
  • Establish alerts and notifications to quickly detect anomalies or deviations from expected behavior.
  • Collaborate with the data science team to develop updated pipelines to cover any faults and Analyze monitoring data, logs, files, and system metrics.
  • Documenting and troubleshoots, changes, and optimization.
  • Work alongside our Product, UX, and Prototype Engineering teams, you’ll leverage your experience and expertise in the AI space to influence our product roadmap, developing innovative solutions that add additional capabilities to our product suite

Requirements​/Qualifications

  • Proficiency in programming languages, such as Python, Java, and C++
  • Must have experience with machine learning frameworks, such as TensorFlow and PyTorch
  • You must have hands-on experience developing systems for the machine learning lifecycle: data preprocessing and feature extraction, model training and evaluation, and deployment and monitoring.
  • Familiarity with the associated open-source ecosystem  (TensorFlow, PyTorch,  mlflow, Ray, Kubeflow, tfx) is a plus.
  • You must have hands-on experience developing large-scale distributed, fault-tolerant, and scalable data processing systems capable of processing terabytes of structured and unstructured data via batch with Spark or streaming with Flink or Kafka Streams.
  • You must have worked with data scientists and you can speak knowledgeably about the major machine learning paradigms, algorithms, and software tools, and can translate data science problem statements into corresponding data, infrastructure, or workflow needs.
  • You Must have  a good grasp of CI/CD pipelines using Jenkins, IaC (Infrastructure-as-code) tools (like Terraform, CloudFormation)
  • Familiar with concepts like firewalls, encryption, VPNs, and secure data transfer.
  • You are familiar with cloud infrastructure services, and container systems such as Docker or Kubernetes.
  • You must be familiar with Python ML (Pyspark, Python libraries: setup tools, pytest and pytest mocking for unit testing, mypy, pylint, sonarqube for code quality)  and at least one high concurrency language as Java, Elixir, , Python, or Golang.

Apply for this job