Senior Site Reliability Engineer
Irving, TX 
Share
Posted 14 days ago
Job Description
OverviewCome build and maintain the world's computer as a member of the Microsoft Capacity Infrastructure Services team in Azure Core. The team ensures new servers are brought online (capacity buildout) to enable Azure customers to leverage the latest offerings, see the illusion of infinite capacity, and grow the Azure business efficiently at hyperscale. As a Senior Site Reliability Engineer, you'll work with a breadth of partners across Microsoft including developers in service teams, hardware engineers, network engineers, datacenter technicians, supply chain managers, and business leaders to rapidly debug and resolve issues delaying this carefully orchestrated buildout sequence. You'll drive continuous improvements with these teams to prevent repeats and address common classes of issues across the Azure software stack through design reviews and problem management. This opportunity will enable you to learn unparalleled system-wide knowledge of how the Azure cloud is built and maintained. The contacts you make with experts will enable you to deep dive on services and new technologies and partner for improvements. You'll be stretched to automate mitigations tactically and strategically analyze data to identify problem areas for driving prioritization. This role requires flexibility to hold virtual meetings and collaborate with partners worldwide. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
ResponsibilitiesParticipate in onboarding, code/design reviews, and regular meetings with the engineering teams that develop and manage products and services.Independently develop code or scripts that automate the performance of repetitive and easily scalable operations processes.Design, develop, and maintain telemetry pipelines and monitoring tools that detail operations metrics.Analyze data and drive improvements with engineering teams.Respond to incidents during regular on-call rotations.Share details related to incidents and their resolution through post-mortem reports and regular review meetings with development teams to drive for durable improvements.Embody our Culture and Values.

 

Job Summary
Company
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Experience
Open
Email this Job to Yourself or a Friend
Indicates required fields