View all jobs

Lead Site Reliability Engineer

Iselin, NJ · Information Technology
Our client, a major bank in Central New Jersey, is looking for talented  Site Reliability Engineer.
Permanent position with competitive compensation package (140K-175K), excellent benefits, and target bonus.

This is hybrid commute position with 2-3 days per week in the office.

** Must be authorized to work for ANY employer in US 
- NO H1 Visa support for this role

Lead Site Reliability Engineer

10+ years of Software Engineering, and Architecture experience with at least 5+ years on SRE focused experience in Production Support, Application Support and DevOps implementation.

•        Demonstrated experience enabling SRE principles and practices with technical and operations teams in different SRE maturity levels in Engineering and Operations space.
•        Demonstrated experience influencing design committee and process teams to establish standards by improving the approaches and maturity across IT teams.
•        Work closely with Infrastructure services and product teams to develop reliable solutions to improve availability, scalability, and performance targets.
•        Experience in SDLC life cycle from architecture and software designs, SLA/SLO definitions, tech debts reviews, CI/CD releases, monitoring KPIs to DevOps principles.
•        Experience in production systems analyzing performance and error metrics, lead triage and troubleshooting exercises and track incident management targets (MTTx)
•        Strong experience in infrastructure and Applications technology components and designs, assess problem areas (logs/events), support in analysis (metrics/traces) and recommend solutions.
•        Hands-on experience coding and developing automation solutions leveraging APIs based integrations, configuration using Ansible and Terraform for IAAS solutions.
•        Experience working in microservices and containerized platforms to support platforms through monitoring, alerting, and troubleshooting needs part of service operations.
•        Technical knowledge and experience in cloud architectures, hybrid cloud and cloud native solutions to leverage reliable designs in cloud to improve operational efficiencies.
•        Experience working in Incident management, leveraging postmortem analysis and developing reliable solutions part of driving multiple incident management initiatives.
•        Experience in Observability tools and frameworks, concepts of golden signals, MELT data integration and Analysis using market solutions to improve operational efficiencies.
•        Experience managing and growing teams to achieve short-term and long-term goals part of the SRE RoadMap and align with SRE strategic goals.
•        Experience handling partnership with multiple peers, stakeholders and able to interact with leadership team and technical teams at different levels.
•        Ability to adapt, support multiple application and infrastructure groups towards SRE needs in a fast-paced, dynamic, and growing organization.
Must have:
•        10+ years of overall IT experience focusing on Software Engineering, Architecture and/or supporting Production technologies.
•        5-7+ years of Monitoring analysis experience using ANY Observability solutions like Splunk, Dynatrace, New Relic, Grafana and Datadog etc.
•        5+ years of development/coding experience developing engineering solutions for a large-scale, mission-critical applications.
•        5+ years of hands-on experience as SRE lead or individual contributor delivering on SRE goals and objectives across IT groups.
•        5+ years of experience working in Kubernetes platforms, public cloud - AWS, GCP, Azure to support in implementation or operational needs.

Please email your resume or
Use this link to apply directly:
Or email: igork@brainsworkgroup.com
Check ALL our Jobs: http://brainsworkgroup.catsone.com/careers

Keywords: SRE site reliability splunk dynatrace new relic grafana datadog kubernetes aws azure gcp cloud


Share This Job

Powered by