Our client, a next generation software provider is looking for a Site Reliability Engineer to join their incredible team of experienced, talented and enthusiastic multi-platform engineers.

This role is for an site reliability engineer to work on a next generation cloud-agnostic, micro-service network management platform.

Remote Working - Based in the UK ideally, or within 2 hours of the UK time zone.

Objective

Assist with the development and maturation of SRE practise and processes.

Service Objective

  • As part of the SRE team help build a Site Reliability Engineering culture across the team by applying best practices, approaches and code
  • Apply automation and propose/implement software to any tasks or parts of the system that would deliver benefit
  • Monitor application performance – identifying, and implementing, improvements to application performance and stability
  • Collaborate with the design and implementation of the desired pipelines and process for deployment to production environment
  • Weekly planning, retro, sprint related meetings and daily team meetings
  • Reporting
  • deployed Kubernetes cluster health, security and maturity using by Polaris reporting Security check deployed clusters and reporting

Twelve Weeks Delivery Milestones:

  • Applying ArgoCD deployments of Delivery mechanism and Service mechanism components to Dev, Test and Staging Prepare a trigger pipelines to check health of clusters after deployment
  • Maintaining uptime and availability of the Software Platform. Minimising outages by:
  • Continuous improvement
  • Analysis of monitoring and logging output to identify potential issues and opportunities.
  • Monitoring and alerting of all components and elements Identifying performance issues Define, monitor and report on:
  • Service-Level Objective (SLO)
  • Service-Level Agreement (SLA) Service-Level Indicator (SLI) Error Budget Perform root cause analysis of all issues to prevent recurrence Provide technical leadership to teammates through coaching and mentorship Implement, and adhere to, DevOps best practices Develop and maintain CI/CD pipelines Collaborate with other technical teams, business analysts and design authority to improve performance, resilience and reliability
  • Prepare reports, manuals and other documentation on the status, operation and maintenance of the CI/CD pipelines Design, develop, and unit test applications in accordance with established standards Participate in peer-reviews of solution designs and related code Oversee and design automated deployment of releases Develop, refine, and tune integrations between application elements Analyse and resolve technical and application problems Assess opportunities for application and process improvement and prepare documentation of rationale to share with team members and other affected parties Research and evaluate emerging developments and best practise within the SRE space Undertake ad-hoc projects and other activities as required.