Site Reliability Engineer

Hyderabad, Telangana, India Hexagon PPM Full-time

almost 2 years ago

Apply to Job

Overview

Hexagon’s PPM Development organization is looking for an experienced Site Reliability Engineer who will be an integral member of a global team charged with running our production cloud systems. Here you will be performing typical operations work amongst development teams as an engineer focused on eliminating toil and inefficiency. The ideal candidate should have strong experience and expertise in running best-in-class and modern cloud infrastructure, operations, and observability. You will have the opportunity to help decide what we as a team focus on and what paths we take as part of a brand new SRE team. You will be amongst peers with much experience who desire to help you in your career growth. This team will promote positivity, shared ownership, accountability, and self-initiative

Responsibilities

Commitment to continually re-defining reliability goals, service-level objectives, measuring those goals, and working to improve our services as needed
Become a master of hands-off Administration of Kubernetes running on Azure AKS
Participate in On-Call rotation to respond to availability incidents and provide support for service engineers with customer incidents
Use your On-Call shift to prevent incidents from happening again
Follow an “automate all things” approach to service delivery and management
Efficiently coding and deploying Infrastructure using Terraform, Terraform Cloud, and AzDo
Make monitoring and alerting trigger on symptoms and not on outages
Completing Root Cause Analysis (RCA) investigations and blameless post-mortems
Performing Readiness Reviews with internal service teams
Plan the growth and control the costs of our infrastructure
Create scalable and extendable patterns to apply across multiple teams

Educational Qualifications

Bachelor’s degree in CS, engineering, software engineering, or related field.
Minimum of 5-10 years combined Operations & Software Development / Engineer experience with a preference of DevOps or SRE roles

Skills Required

Experience with at least one programming, scripting language (Preferences: PowerShell, C#, Python, go)
Solid understanding in the challenges and trade-offs to be made when building and deploying systems to production
Kubernetes certifications or an interest in obtaining these certifications are a big plus: (Certified Kubernetes Administrator (CKA) and Certified Kubernetes Security Specialist (CKS))
Experience with large scale distributed cloud service development, infrastructure, traffic management, and architecture
Good self-awareness, accountability, conflict resolution skills, and great at receiving feedback
Kubernetes, Terraform, Azure DevOps, Microsoft Azure, PowerShell, C#, PagerDuty, GitOps, SRE, DevOps, Infrastructure as Code (IaC), Operations, Cloud, Docker, Helm, Flux

Soft skills required include:

Excellent communication skills (verbal and written).
Effective in a team environment as well as working independently.
Excellent problem-solving skills.
Able to work in a fast-paced milestone driven environment.
Ability to document the design or explain status information etc., in emails well.

Apply to Job