AI Performance Engineer

Company: Cornelis Networks, Inc.
Location: Austin
Posted on: April 1, 2026

Job Description:

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our next-generation networking solutions. We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles. We’re seeking an AI Performance Engineer that will optimize training and multi-node inference across next-gen networking silicon and systems—adapters, switches, and the software stack that ties it all together. You’ll partner with architecture, firmware, software, and lighthouse customers to turn lab results into field-proven wins with an emphasis on distributed serving architectures and P99-aware optimizations. Key Responsibilities: Own end-to-end performance for distributed AI workloads (training multi-node inference) across multi-node clusters and diverse fabrics (Omni-Path, Ethernet, InfiniBand). Benchmark, characterize, and tune open-source & industry workloads (e.g., Llama, Mixtral, diffusion, BERT/T5, MLPerf) on current and future compute, storage, and network hardware, including vLLM/TensorRT-LLM/Triton serving paths. Design and optimize distributed serving topologies (sharded/replicated, tensor/pipe parallel, MoE expert placement), continuous/adaptive batching, KV-cache sharding/offload (CPU/NVMe) & prefix caching, and token streaming with tight p99/p999 SLOs. Optimize inferencing: Validate RDMA/GPUDirect RDMA, congestion control, and collective/point-to-point tradeoffs during inference. Design experiment plans to isolate scaling bottlenecks (collectives, kernel hot spots, I/O, memory, topology) and deliver clear, actionable deltas with latency-SLO dashboards and queuing analysis. Build crisp proof points that compare Cornelis Omni-Path to competing interconnects; translate data into narratives for sales/marketing and lighthouse customers, including cost-per-token and tokens/sec-per-watt for serving. Instrument and visualize performance (Nsight Systems, ROCm/Omnitrace, VTune, perf, eBPF, RCCL/NCCL tracing, app timers) plus serving telemetry (Prometheus/Grafana, OpenTelemetry traces, concurrency/queue depth). Evangelize best practices through briefs, READMEs, and conference-level presentations on distributed inference patterns and anti-patterns. Minimum Qualifications: B.S. in CS/EE/CE/Math or related 5–7 years running AI/ML at cluster scale. Proven ability to set up, run, and analyze AI benchmarks; deep intuition for message passing, collectives, scaling efficiency, and bottleneck hunting for both training and low-latency serving. Hands-on with distributed training beyond single-GPU (DP/TP/PP, ZeRO, FSDP, sharded optimizers) and distributed inference architectures (replicated vs sharded, tensor/KV parallel, MoE). Practical experience across AI stacks & comms: PyTorch, DeepSpeed, Megatron-LM, PyTorch Lightning; RCCL/NCCL, MPI/Horovod; Triton Inference Server, vLLM, TensorRT-LLM, Ray Serve, KServe. Comfortable with compilers (GCC/LLVM/Intel/OneAPI) and MPI stacks; Python shell power user. Familiarity with network architectures (Omni-Path/OPA, InfiniBand, Ethernet/RDMA/ROCE) and Linux systems at the performance-tuning level, including NIC offloads, CQ moderation, pacing, ECN/RED. Excellent written and verbal communication—turn measurements into persuasion with SLO-driven narratives for inference. Preferred Qualifications: M.S. in CS/EE/CE/Math or related Scheduler expertise (SLURM, PBS) and multi-tenant cluster ops. Hands-on profiling & tracing of GPU/comm paths (Nsight Systems, Nsight Compute, ROCm tools/rocprof/roctracer/omnitrace, VTune, perf, PCP, eBPF). Experience with NeMo, DeepSpeed, Megatron-LM, FSDP, and collective ops analysis (AllReduce/AllGather/ReduceScatter/Broadcast). Background in HPC performance engineering or storage (BeeGFS, Lustre, NVMeoF) for data & checkpoint pipelines. Location: This is a remote position for employees residing within the United States. We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry. At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives. In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave. Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Keywords: Cornelis Networks, Inc., Round Rock , AI Performance Engineer, IT / Software / Systems , Austin, Texas

Didn't find what you're looking for? Search again!

Let Austin recruiters find you. Post your resume for free!

Get Austin IT / Software / Systems jobs via email.

View more Round Rock IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Independent Insurance Claims Adjuster in Temple, Texas
Description: IS IT TIME FOR A CAREER CHANGE INDEPENDENT INSURANCE CLAIMS ADJUSTERS NEEDED NOW Are you ready to embark on a dynamic and in-demand career as an Independent Insurance Claims Adjuster This is your chance (more...)
Company: MileHigh Adjusters Houston
Location: Belton
Posted on: 04/3/2026

Customer Service/Sales
Description: Job Description Customer Service/Sales associates provide fast, friendly service by actively seeking out customers to assess their needs and provide assistance. These associates learn about products using (more...)
Company: Home Depot
Location: Leander
Posted on: 04/3/2026

Speech Language Pathologist SLP Home Health
Description: Company: Ascension at Home together with Compassus Position Summary The Home Health Speech Language Pathologist - PPV is responsible for modeling the Compassus values of Compassion, Integrity, Excellence, (more...)
Company: Compassus
Location: Burnet
Posted on: 04/3/2026

Salary in Round Rock, Texas Area | More details for Round Rock, Texas Jobs |Salary

Postdoctoral Research Associate
Description: Job Title Postdoctoral Research Associate Agency Texas A amp M Agrilife Research Department Temple Proposed Minimum Salary Commensurate Job Location Temple, Texas Job Type Staff Job Description About (more...)
Company: Texas A&M Agrilife Research
Location: Temple
Posted on: 04/3/2026

Physical Therapist Assistant PTA Home Health PRN
Description: Company: Ascension at Home together with Compassus Marble Falls and Llano Coverage Areas PRN At Compassus, we know that caring for our teammates is the first step in caring for our patients. We are committed (more...)
Company: Compassus
Location: Burnet
Posted on: 04/3/2026

External Affairs Regional Manager
Description: Rowan Digital Infrastructure is redefining how data centers are delivered faster, smarter, and at scale. We partner with hyperscale customers to provide tailored, high-performance infrastructure with (more...)
Company: Rowan Digital Infrastructure
Location: Temple
Posted on: 04/3/2026

Store Support
Description: Position Purpose: Associates in Store Support positions are responsible for a variety of non-sales functions. This may include ensuring an outstanding customer
Company: Home Depot
Location: Leander
Posted on: 04/3/2026

Cashier
Description: Job Description Cashiers play a critical customer service role by providing customers with fast, friendly, accurate and safe service. They process
Company: Home Depot
Location: Leander
Posted on: 04/3/2026

In Home Healthcare RN: High Acuity
Description: Aveanna Healthcare is seeking a passionate and dedicated Registered Nurse RN to provide high acuity nursing care to patients in their homes. This position entails delivering advanced medical care and (more...)
Company: Home-Based Care Workforce
Location: Moody
Posted on: 04/3/2026

Lot Associate
Description: Job Description Lot Associates assist customers with the loading of their vehicles and also monitor
Company: Home Depot
Location: Leander
Posted on: 04/3/2026

Loading more jobs...

AI Performance Engineer

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account