Lead Test Orchestrator and Management Engineer

Apply now »

Date: Jul 28, 2025

Location: Richardson, TX, US

Company: Celestica International Inc.

Req ID: 122338 
Remote Position: No
Region: Americas 
Country: United States 
State/Province: Texas 
City:  Richardson 

Summary

The Lead Test Orchestrator and Management Engineer will play a pivotal role in defining, building, and operating the automated test infrastructure and methodologies for our rack-based AI data center products. This involves leading the strategy for orchestrating complex test campaigns across tightly integrated hardware (servers, GPUs, networking, storage, power) and software stacks. You will be responsible for validating the entire system's performance, stability, and adherence to design specifications under real-world AI workloads, ensuring our products are robust and ready for deployment at scale.

Knowledge / Skills / Competencies

  • Define, develop, and lead the execution of a comprehensive test strategy for our Data Center Orchestration and Management Product. This includes functional, performance, scalability, reliability, security, and stress testing of the control plane and data plane interactions.

  • Architect and implement sophisticated test methodologies to validate the product's ability to orchestrate, provision, monitor, and manage complex, multi-node, rack-based AI systems.

  • Drive the design of test environments that accurately simulate large-scale data center operations to rigorously test the management product's resilience and behavior under various load conditions and failure scenarios.

  • Design, develop, and maintain robust, scalable test automation frameworks primarily using Python to automate the validation of the Orchestration and Management Product.

  • Build automated test suites that interact with the product's APIs (REST, gMNI-gRPC), CLI, and UI to ensure comprehensive coverage of its functionality and performance.

  • Integrate automated tests into continuous integration/continuous deployment (CI/CD) pipelines, enabling rapid feedback on code changes and ensuring high quality for every release of the orchestration product.

  • Develop custom tools and harnesses to simulate managed devices, generate test data, and orchestrate complex test scenarios against the management system.

  • Conduct deep-dive performance analysis, benchmarking, and bottleneck identification across all layers of the rack (CPU, GPU, memory, PCIe, network fabric, storage I/O, power delivery).

  •  and AI/ML teams to ensure test coverage aligns with product requirements and critical AI workload behaviors.

  • Analyze test data to identify trends, predict failures, and provide actionable insights for product improvements.

 

  • Provide expert-level troubleshooting and root cause analysis for complex issues identified within the Orchestration and Management Product, collaborating closely with software development, SRE, and data center operations teams.

  • Participate actively in product design reviews and architectural discussions, offering critical insights to ensure the testability, reliability, and security of the management platform from its inception.

  • Analyze complex test data to identify performance bottlenecks, functional regressions, and systemic issues within the orchestration product, providing actionable insights for development teams.

Required Qualifications

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.

  • 8+ years of progressive experience in system-level testing, validation, or QA for complex hardware/software integrated products, with a strong focus on data center or high-performance computing (HPC) environments.

  • Proven experience in a lead or senior technical role, including mentoring junior engineers and defining test strategies.

  • Expertise in developing robust test automation frameworks and scripts using Python.

  • Deep understanding of rack-level system architectures, including servers (x86/ARM), GPUs, high-speed networking (Ethernet, InfiniBand), and enterprise storage (NVMe).

  • Experience with performance benchmarking and tuning tools for CPU, GPU, network, and storage.

  • Proficiency in Linux operating systems, including system administration and debugging.

  • Strong analytical, problem-solving, and debugging skills for complex, distributed systems.

  • Excellent communication and collaboration skills to work across multi-disciplinary teams.

Preferred Qualifications

  • Direct experience with AI/ML hardware platforms and their unique testing challenges.

  • Familiarity with AI frameworks (e.g., TensorFlow, PyTorch, JAX) and their resource utilization patterns.

  • Experience with orchestration tools (e.g., Kubernetes, Slurm, OpenStack, Ansible, Chef, Puppet) for managing compute resources.

  • Knowledge of data center power and thermal management principles

Notes

This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time.

Celestica is an equal opportunity employer. All qualified applicants will receive consideration for employment and will not be discriminated against on any protected status (including race, religion, national origin, gender, sexual orientation, age, marital status, veteran or disability status or other characteristics protected by law).
At Celestica we are committed to fostering an inclusive, accessible environment, where all employees and customers feel valued, respected and supported. Special arrangements can be made for candidates who need it throughout the hiring process. Please indicate your needs and we will work with you to meet them.

 

COMPANY OVERVIEW:
Celestica (NYSE, TSX: CLS) enables the world’s best brands. Through our recognized customer-centric approach, we partner with leading companies in Aerospace and Defense, Communications, Enterprise, HealthTech, Industrial, Capital Equipment and Energy to deliver solutions for their most complex challenges. As a leader in design, manufacturing, hardware platform and supply chain solutions, Celestica brings global expertise and insight at every stage of product development – from drawing board to full-scale production and after-market services for products from advanced medical devices, to highly engineered aviation systems, to next-generation hardware platform solutions for the Cloud. Headquartered in Toronto, with talented teams spanning 40+ locations in 13 countries across the Americas, Europe and Asia, we imagine, develop and deliver a better future with our customers.

 

Celestica would like to thank all applicants, however, only qualified applicants will be contacted.
Celestica does not accept unsolicited resumes from recruitment agencies or fee based recruitment services.

 


Nearest Major Market: Dallas
Nearest Secondary Market: Fort Worth

Job Segment: Testing, Cloud, QA Tester, Thermal Engineering, Test Engineer, Technology, Quality, Engineering

Apply now »