About the Role

Join us as a Production Analyst

This is an opportunity to make a real impact and be pivotal in the success of our business, while benefiting from great variety and stakeholder exposure
We’ll look to you to deliver a complex and critical production management, infrastructure and application support service for relevant platforms, activities and processes across the domain
Hone your existing analytical skills and advance your career in this exciting, fast paced role
We're offering this role as associate vice president level

What you'll do

As a Production Analyst, you’ll be responsible for system performance & uptimes, IT Digital operations, maintaining and enhancing systems’ operational efficiency along with focus on deployment automation and system optimization, ensuring consistent performance and reliability. The candidate must have robust hands on problem-solving technical skills and a strong desire to implement scalable and sustainable technological solutions.

Anchor & provide strategic direction regarding technologies & solutions in Digital operations. Lead infrastructure & application builds & technical maintenance along with the core engineering & delivery teams.
Custodian of SRE SLO, SLI & Error Budgets. Application scalability & optimization: Assist in designing and implementing scalable, highly available system architectures to handle increasing loads and user demands without compromising performance.
Creating and optimizing CI/CD pipelines to automate testing and deployment processes, reducing the time from development to production and ensuring consistent quality control.
Designing, Monitoring & Responding to system alerts, Monitoring system performance, identifying bottlenecks, and executing optimization & permanent fixes.
Managing incident response protocols, including on-call rotations. Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
Provide primary operational support and engineering for multiple large-scale distributed software applications. Collaborate with development operations staff to create, monitor, and troubleshoot the system infrastructure.
Increase system resilience and serve larger customer volumes with expert-level coding, bulletproof release, and change management skills. Improve automation and increase the system’s self-healing capability.
Collect operating system data and report performance metrics to stakeholders. Manage cloud and database system maintenance, debugging production issues as they arise.
Ensuring the effective and seamless integration of security policies and practices to DevOps workflows to reduce overall risks and deliver products and services on time.
Implement the E2E automated VAPT for any new or existing application. Reduce the planned deployment downtime by ensuring robust CI/CD setup by 50%.
MTTR (Mean time to recovery) to less than 2 hr for any major issues. MTTD (Mean time to detect) to less than 5 min with help of automated tools & methods.

Your Role Will Also Involve

Collaborating with product development and feature teams to understand the upcoming product, enabling continuous integration and continuous deployment to occur
Regularly attending the feature teams’ refinement and planning sessions
Identifying areas for service improvement by analysing and diagnosing re-occurring platform and service incidents, as well as customer and stakeholder feedback
Building a culture of continuous improvement to reinforce the robustness of the domain, with a focus on automation, scalability, continuous integration and continuous delivery

The skills you'll need

We’re looking for someone with technical knowledge and experience including platform, technology, products and domains.

You'll have 12+ years of strong experience in DevSecOps & SRE experience in production support.
Proven experience in managing large-scale distributed systems and understanding the principles of scalability and reliability.
Ownership of DevOps DORA metrics, SRE TOIL reduction – with automation.
Experience in security tools like SAST, DAST, container security, understanding of Node.js, React.js, JAVA, Oracle, IDMC, experience in Infra as Code like Terraform, CloudFormation.
Experience in container technologies like Docker, Kubernetes, OpenShift. Must have knowledge of DevSecOps tools like Git, Maven, Selenium, Jenkins, Ansible, Security Tool
Anyone of the Monitoring tools knowledge Geneos, Nagios, Prometheus, DynaTrace, AppDynamics, DX-APM, SPLUNK.
Scripting Knowledge: UNIX Shell, (Python groovy, YAML ((good to have)).
Experience and understanding in at least one cloud provider like AWS, Azure etc. On demand Infra provisioning – environment spinoffs – environment cloning – EKS, IAAC
Working hands-on knowledge of configuring SLA, SLO, SLIs and infra + business rules/logics in AppDynamics, AWS CW, PingDom, DataDog, Tivoli etc (APM – preferably).
Understanding network protocols, load balancing, and firewall management for secure and efficient network operations.

Site Reliability Engineer, AVP

About the Role

Apply for this position

Log in or Sign up to Apply

Application Status

Similar Jobs

Earn Credits

Share Job Post

Add Recruiter