
Nithish S
Site Reliability Engineer
Compétences

Voir mes services


Expérience professionnelle
Site Reliability Engineer
Tata Consultancy Services • Temps plein
Mar 2022 - Present • 4 yrs 2 mos
Platform SRE supporting Tier-1 banking applications running on Kubernetes (OpenShift/GCP). Owned platform reliability and availability for production-grade, microservices- based systems. Acted as Incident Commander for SEV-1 / SEV-2 incidents, leading end-to-end incident response and cross-team coordination. Reduced MTTR by ~25% through structured incident response, improved alerting, runbooks, and deep observability. Defined and operationalized SLIs/SLOs and built proactive monitoring and alerting using Splunk, New Relic, and ITRS Geneos, sustaining ~99% availability. Conducted blameless postmortems and root cause analysis as part of ITIL Problem Management. Established platform-level change and release workflows using GitOps and CI/ CD pipelines, aligning DevOps automation with ITIL Change Management. Improved deployment success rate by ~15% through controlled rollouts and rapid rollback strategies. Automated infrastructure provisioning using Terraform (IaC). Reduced operational toil by ~30% through automation and self-service workflows while owning 24/7 on-call for mission-critical systems. Investigated backend failures and Oracle SQL transaction issues, restoring data integrity and service functionality.