Senior Site Reliability Engineer - Observability
Company: Careerbuilder-US
Location: Round Rock
Posted on: January 19, 2023
Job Description:
Company SummaryJoin a team that puts its People First! Since
1889, First American (NYSE: FAF) has held an unwavering belief in
its people. They are passionate about what they do, and we are
equally passionate about fostering an environment where all feel
welcome, supported, and empowered to be innovative and reach their
full potential. Our inclusive, people-first culture has earned our
company numerous accolades, including being named to the Fortune
100 Best Companies to Work For - list for seven consecutive years.
We have also earned awards as a best place to work for women,
diversity and LGBTQ+ employees, and have been included on more than
50 regional best places to work lists. First American will always
strive to be a great place to work, for all. For more information,
please visit www.careers.firstam.com.Job SummaryThe team is on an
exciting transformation journey and moving from the classic support
model to the site reliability engineering model. We are looking for
a person who has prior SRE experience on the on-premise and cloud
technologies. We are in search of an engineer who has lived and
thrived during the transformation journey by not just following the
work directed, but by the virtue of thinking out-of-the-box and
working on developing monitoring applications and taking up deep
dives into issues.
Essential Functions:
- Efficiently handle live production incidents,
debug/troubleshoot application and infrastructure issues, follow
and implement SRE best practices.
- Monitor application performance, take steps to improve overall
application performance and stability, and follow through with
implementation.
- Build end-to-end monitoring infrastructure (Logging, Metrics,
Tracing) and work closely with the other Production Engineers to
provide the right tooling to measure the reliability of our
systems.
- Establish SLIs, SLOs, Error Budgets, and other SRE metrics to
ensure the better reliability.
- Collaborate with development and operations team to ensure
availability and reliability of the application and
infrastructure.
- Serve as an escalation point for other Systems Administrators,
Engineers, and other technology teams in the resolution of server
and system problems.
- Maintain effective knowledge base and runbooks to bring faster
resolution to production issues.
- Provide weekend on-call rotation for production support.
- Communicate with stakeholders using strong written and verbal
communication.
- Constantly update personal technical and business knowledge and
skills and mentor others to increase the knowledge and skills of
the team.
- Provide stellar organizational support, customer support, and
self-manage project initiatives.Technical Skills:
- Bachelor's degree in computer science or equivalent combination
of education and experience.
- 9+ years of hands-on experience in application and technical
support role in live production environment following Development,
DevOps, and SRE best practices.
- 6+ years of hands-on experience with configuring and monitoring
via tools such as Splunk, AppDynamics, ELK, Microsoft SCOM, Windows
Processes, JavaScript Framework, etc.
- 4+ years of experience with monitoring web-based applications,
webservices, and database driven applications using Microsoft
Technologies C#, .Net 4.5, Azure DevOps, & SQL Server 2016.
- 2+ years of experience monitoring on AWS Workloads using AWS
CloudWatch, AWS X-Ray, etc.
- Experience in monitoring and analyzing infrastructure
performance using standard performance monitoring tools - Perfmon,
PerfView, ProcDump, DebugDiag, etc will be a plus.
- Experience with automation using PowerShell, Python scripting
or similar tech preferred.#techreferral#LI-JC2#TCORPITFirst
American invests in its employees' development and well-being,
empowers them to provide superior customer service and encourages
them to serve the communities where they live and work. First
American is committed to diversity and inclusion. We are an equal
opportunity employer.Based on eligibility, First American offers a
comprehensive benefits package including medical, dental, vision,
401k, PTO/paid sick leave and other great benefits like an employee
stock purchase plan.
Keywords: Careerbuilder-US, Round Rock , Senior Site Reliability Engineer - Observability, Engineering , Round Rock, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...