Senior Site Reliability Engineer @ Tilt
Tilt (check us out here) is looking for a Senior Site Reliability Engineer (SRE) to help us scale our business by leading initiatives that ensure the reliability, scalability, security, and performance of our systems and services. In addition to core SRE responsibilities, you will drive the adoption of AI and automation tools to make our infrastructure more self-healing, proactive, and efficient.
Responsibilities:
- Architect, evolve, and own all cloud infrastructure in AWS including networking, databases, compute, secrets management, and other resources to ensure high availability, robust security, and optimal performance for Tilt's applications and services.
- Design and maintain Kubernetes clusters for container orchestration, ensuring scalability, reliability, and efficient resource utilization
- Develop and enforce infrastructure-as-code standards and tooling, including Terraform, for managing and provisioning cloud resources
- Implement and optimize CI/CD pipelines using GitHub Actions to enable rapid, reliable, and secure delivery of applications and infrastructure changes
- Lead adoption and management of GitOps tooling such as FluxCD or ArgoCD for automated Kubernetes resource management
- Proactively detect anomalies, forecast capacity issues, and recommend optimizations using tools such as AWS CloudWatch, DataDog, or similar observability platforms.
- Automate incident detection, triage, and remediation workflows to reduce mean time to resolution (MTTR) and improve service availability
- Serve as a technical lead in incident management, driving root cause analysis, post-incident reviews, and long-term improvements
- Champion security best practices and privacy by design across infrastructure and operational processes to protect customer data
- Mentor and coach other engineers on SRE principles, AI/ML for infrastructure, and automation best practices
- Participate in the on-call rotation and lead high-priority incident response
You’re a great fit if:
- You have deep expertise with networking, cloud architecture, and operational practices in AWS environments
- You have 5+ years in SRE, DevOps, or Cloud Engineering roles, with increasing leadership responsibilities
- You have experience integrating AI/ML into operational workflows such as monitoring, capacity planning, or incident response
- You excel at troubleshooting and diagnosing complex distributed systems, and designing scalable, automated solutions.
- You proactively identify system risks, capacity concerns, and performance bottlenecks before they become customer-impacting issues
- You have high levels of empathy and can connect deeply with Tilt’s mission
- You are comfortable working in ambiguous environments and know that we need your help to figure things out
- You are comfortable using a lot of systems at once, and have the ability to learn software quickly
- You are fearlessly flexible and curious; aka you thrive in an environment where we don't have all the answers but are willing to help us figure them out
- You have experience working with a startup and/or with a B2B SaaS business
Virtues/Competencies:
Health & Family First
You balance work and personal life effectively
You get things done at a pace consistent with the business needs
You show up and are reliable
You lead by example when setting healthy work expectations for your direct reports
Autonomy + Team. Always
You are highly organized and can manage multiple priorities and targeted releases at once
You are focused on scale and building - you understand that pace is equally as important as quality
You drive and facilitate decision making
Be Curious
When you don’t know, you ask for help
You keep up-to-date with the latest trends in the technologies your team supports
You enjoy helping others grow and building lasting relationships
Love Our Customers
You show empathy and compassion; you strive to meet people where they are to offer maximum support
You understand that customers can be both internal and external to the company
Fearlessly Flexible
You go with the flow and deal with (lots) of ambiguity
You’re not afraid to work without clear direction
You like working together with others to help define the path forward
Total Compensation
The projected annual salary range is $165,000 - $175,000 USD plus stock options (ISOs), because we believe everyone should have some stake in our business.
You must be authorized to work in the US.
So what do you say? Do you want to join our team?