Director - Site Reliability Engineer
Sugar Land, TX 
Share
Posted 9 days ago
Job Description

Position Title: Director - Site Reliability Engineer

Position Summary:

The Director - Site Reliability Engineer (SRE) is responsible for critical contributions to the technology strategy, architecture, standards, and IT operations in support of TDECU's business objectives and drives the infrastructure and cloud operation strategy for on-premises and cloud operations, enabling a 24x7 operation. The Director- SRE is responsible for leading and managing a team of engineers and architects to ensure the high availability, performance, and scalability of TDECU's On Prem & digital infrastructure and services. Primary focus will be on building and maintaining a reliable and robust technology ecosystem, guaranteeing exceptional service uptime and customer satisfaction. The incumbent engages regularly in thought leadership and must understand the complexity of working in an enterprise that spans multiple, dispersed locations. The Director delivers impactful contributions, KPI's, progress reports and key metrics to senior leaders for enterprise technology initiatives.

Essential Duties and Responsibilities:

Formulates and executes a comprehensive strategy to ensure the reliability, scalability, and performance of the on-prem & digital infrastructure. Identifies and prioritizes areas for improvement, create roadmaps, and establish measurable goals to track progress and drive continual enhancement.

Drives the adoption of automation, monitoring, and alerting tools to enhance the reliability and availability of our systems. Collaborates with engineering teams to design and implement self-healing systems, proactively identifying potential issues and automating recovery processes.

Evaluates, selects, develops, and maintains relationships with TDECU and strategic vendors and business partners to ensure quality, reliability, flexibility, and cost control; ensures that all developing proposals and planning work are consistent with both strategic plans & business objectives.

Manages Data Center Operations, Server and Storage, Network/Voice, and Cloud operations teams to ensure adequate staffing and efficiency through training, counseling, supervising, and reviewing the performance of employees.

Establishes and utilizes service level agreements, key performance indicators (KPIs), and metrics to ensure the availability, quality, and performance of infrastructure systems meet and exceed business requirements.

Develops short and long-term objectives and plans, as well as provides guidance for the career growth and development of IT teams. Prioritizes work efforts to balance operational tasks with long-term strategic initiatives.

Implements and maintains all cloud, server and storage, and network services infrastructure with special emphasis on the security and stability of TDECU systems and data confidentiality.

Develops and executes plans to maintain and support the legacy infrastructure while also transitioning services to next-generation services and capabilities including cloud, compute, data services, etc.

Drives the adoption and optimization of cloud services, ensuring efficient and cost-effective utilization of cloud resources Collaborates with cross-functional teams to migrate legacy systems to cloud platforms and implement cloud-native solutions.

Advises and provides recommendations to senior leaders for technology trends related to IT, Credit Union, and general Financial Services industries that are relevant and significant to the company.

Provides leadership, motivation, coaching, professional development, and day-to-day support to foster an engaging work environment with a focus on customer service.

Develops, schedules, tests, and communicates IT Business Continuity and Disaster Recovery plans and procedures.

Collaborates with TDECU leaders and staff to identify and serve as the top escalation level for addressing IT Infrastructure incidents and operations issues.

Ensures architectural conformance to IT processes and controls, confirms information security is applied at the appropriate level to protect information and technology assets, and establish IT standards that support audit and regulatory compliance requirements.

Partners with internal and external auditors in IT architecture risk assessments, audit, and security incident investigations to ensure compliance with adopted IT policies, procedures, and legislation related to data privacy safeguarding sensitive information

Facilitates migration of non-compliant environments to compliant environments based on legal requirements, audit findings, and risk assessment recommendations. Ensures compliance with relevant industry standards and regulations.

Minimum Qualifications:

Education:

  • Bachelor's or four-year degree in Management Information Systems, Computer Science, Engineering, or similar discipline, or an equivalent mix of education and experience. ITIL and other similar systems-related certifications in areas such as Network, Cloud, Storage Area Network, and System Administration are preferred

Experience:

  • 5-7 years of IT experience with significant depth in cloud, infrastructure, and systems operations management including but not limited to demonstrated experience in data centers, network, security, disaster recovery, service desk, and systems operations.
  • 5-7 years of supervisory experience required. Experience in the credit union and/or financial services industry is preferred.
  • In-depth knowledge of cloud platforms (e.g., AWS, Azure, GCP) and experience with cloud migration projects.
  • Previous experience in project leadership, planning, cost management, strategy development, and/or business operations of Infrastructure Cloud Operations efforts required.

Knowledge, Skills, and Abilities:

  • Strong knowledge of deploying and supporting SAN, Server Virtualization, Clustering, Microsoft 365, Active Directory, wide area networks, and other systems and infrastructure used in the financial services industry to support continuous availability capabilities.
  • Strong experience in deploying and supporting Cloud resources including Application Gateway, API Gateway, Web Application Firewall, Cloud Firewall, Microservices, Containerized Workloads, Event Streaming, Cloud Storage, etc. in Cloud Native and Hybrid environments.
  • In depth knowledge and hands-on experience with one or more major cloud platforms (AWS, Azure or Google Cloud Platform)
  • Experience in implementing Cloud Security Best Practices and a proven track record of implementing Robust Security Processes
  • Familiarity with Identity and Access Management (IAM) frameworks
  • Demonstrable ability to optimize Cloud Resources to ensure cost-effective deployments without compromising performance and reliability.
  • Familiarity with Agile and DevOps principles and processes.
  • Experience with IaaC and infrastructure deployment and configuration using automated tools such as TerraForm, Ansible, or CloudFormation.
  • Strong knowledge of routine core financial systems, financial processes/reporting, optical systems, data management, and Information Systems Operations.
  • Strong financial management skills in budgeting, forecasting, and cost control.
  • Extensive ability in the management of computer technology areas with an emphasis in the areas of people management, verbal/written communication, and innovative thinking.
  • Extensive knowledge and hands-on experience with system/security patching on 1 or more patch management platforms
  • Familiarity with ITIL v4.
  • Experience managing DaaS (Desktop as a Service) platforms
  • Demonstrated ability to communicate and implement effective and relevant best practices.
  • Ability to interact with business units. The ability to have an understanding of multiple business disciplines and the impact of IT on their functions.
  • Demonstrated ability to develop, maintain, and enforce written policies and procedures regarding all data, computing, and voice operations. Identify areas of process improvement to streamline and enhance the service level of the IT team.
  • Skilled in managing medium and large infrastructure & cloud projects such as data center relocations, hardware platform upgrades, and disaster recovery plan implementations.
  • Proven ability in developing and implementing performance metrics in the areas of service level agreements, system and network uptime measurements, and performance monitoring.
  • Ability to prioritize and organize tasks and projects at different stages of completion, identify targets and milestones, and to negotiate as appropriate.
  • Demonstrated ability to create viable business cases for enterprise enhancements.
  • Ability to communicate efficiently and effectively both verbally and in writing, upward, downward, and across the organization, and with third parties.
  • Demonstrated ability to develop and foster the professional growth of team members.

Physical Demands and Work Environment:

(The physical demands and work environment characteristics described herein are representative of those that must be met by an employee to successfully perform essential functions of this position and/or may be encountered while performing essential functions. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.)

  • While performing the essential duties of this position, an employee would frequently be required to stand, walk, and sit.
  • Specific vision abilities required by this position include close vision, distance vision, and the ability to adjust focus.
  • The noise level in the work environment is usually moderate.

Our company offers a dynamic hybrid work arrangement, which requires three days on-site, in the Sugar Land, TX office. Our retail roles are required to be onsite at the branch locations.

Disclaimer:

The above statements are intended to describe the general nature and level of work being performed by people assigned to this job. They are not intended to be an exhaustive list of all responsibilities, duties, and skills required of personnel so classified.

Texas Dow Employees Credit Union is an equal opportunity employer, dedicated to a policy of non-discrimination in employment on any basis including race, color, age, protected veteran status, sex, religion, disability, genetic information, national origin or other status protected by federal, state or local law. Consistent with the American Disabilities Act, applicants may request accommodations needed to participate in the application process.

 

Job Summary
Company
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Education
Bachelor's Degree
Required Experience
5 to 7 years
Email this Job to Yourself or a Friend
Indicates required fields