Join our companies in their quest to drive powerful, positive, change that endures.

Platform Engineer - Site Reliability Engineer

Cazoo

Cazoo

Software Engineering
London, UK
Posted on Saturday, February 25, 2023

About us:

Cazoo was founded in 2018 to transform the way people buy used cars

We believe everyone deserves to feel total confidence when buying a car. That’s why we’re building a new service which is simple, transparent and doesn’t end when you buy the car. Our vision is to deliver the UK’s best car buying experience by putting the customer first and making it no different from buying any other product online today.

Our team has an incredible track record of building successful consumer platforms, our model has been proven in other geographic locations, and the huge used car market in the UK is ready for a new approach. We want to deliver a trusted brand, frictionless service, and a convenient platform for our customers that becomes the go-to destination for all used car purchases in the UK.

Our Tech Stack:

Across our systems, we make use of Event-Driven Architecture and TypeScript. Our infrastructure is primarily serverless and we make use of various serverless technologies like Lambda, EventBridge, S3, and DynamoDB.

We use a multitude of DevOps tools and we are a very experiment-driven tech team. For IaC we primarily use Terraform and the Serverless Framework. We use GitHub Actions for CI/CD. For observability and monitoring, we use Honeycomb Datadog. We use Cloudflare for internal access control, DNS, routing, and edge computing.

About the role:

The Platform Engineering Team currently consists of 2 teams:

  • Cloud Engineering
  • Developer Experience (consisting of 3 areas: DevX, Web Platform, and SRE)



We are looking to bolster our team with a Site Reliability Engineer, who has a real passion for cloud engineering, reliability, observability and architecture standards.

We value simplicity over complexity and aim to reduce any undifferentiated heavy lifting so that product engineering teams can easily adopt tooling and standards giving them the autonomy to focus on innovating and delivering the best car-buying experience to our customers.



Working with the rest of Platform Engineering, we want to define and deliver a paved-road strategy for Cazoo tech. As a Site Reliability Engineer, you will play a part in improving our observability standards and will help to roll out Datadog across all our product teams. You will also be spotting gaps in our offerings and contributing to the roadmap of a team of passionate engineers.

In addition, you’ll work closely with the product engineering teams to help provide a world-class platform.

What you'll be doing:

  • Contributing to creating a Site Reliability Engineering mindset, principles and practices that take into account the Cazoo way of working to guide your team and other engineering teams to put them all into practice.
  • Mentoring and being a senior engineering figure on a fast-paced, highly motivated team with a diverse work background that includes cloud infrastructure, networking and software development.
  • Establishing SRE principles and best practices in a serverless and event-driven tech stack that will require you to be creative, explore, learn and build tools to support a paved road approach for building services.
  • Work closely with the software engineering teams to ensure that the platform, infrastructure and services are designed and optimised for availability, latency and performance.
  • Participating in incident response, resolution, root cause investigation, retrospective writing up and follow-up actions so we can take every opportunity to learn, improve and make our services more resilient.
  • Driving the onboarding and adoption of our core tools, services and third-party apps.

Experience required:

  • Previous experience working on an SRE, Platform or DevOps team.
  • Experience with the core AWS services and AWS serverless services.
  • From an engineering background, familiar with any modern programming language (e.g. Python or Node.js), and an awareness of TDD.
  • Understanding of the DevSecOps culture and experience in delivering technical outcomes within this culture.
  • Knowledge and comfortable with agile development practices.
  • Strong experience and knowledge of observability, both in terms of best practices, implementations and experiences with observability vendors (preferably Datadog).
  • Experience with the core AWS services and AWS serverless services.
  • Understanding of the DevSecOps culture and experience in delivering technical outcomes within this culture.
  • Strong communication and stakeholder management skills, with an ability to communicate complex technical topics to non-technical stakeholders.
  • Nice to have: Experience in scaling a ‘paved road’ type strategy through evangelism and advocacy eg. community of practices, and platform champions.

Interview Process:

We understand that your time is in demand and as such keep our interview process as quick and painless as possible, outlining all timeframes in advance for you to plan around.

Following an initial screening with our Talent Acquisition team, the interview will be conducted over 3 stages, each being a remote video call using Google Meet:

  • A paired programming exercise –demonstrating your paired programming abilities (90 mins)
  • Technical Knowledge, QA (60 mins)
  • Culture and Values interview (60 mins)

Following this, we will give you a decision – and hopefully an offer – normally within 24 hours.