Incident & Problem Manager
|Company Name:||Chalhoub Group|
|Employment Type:||Full Time|
|No. Of Vacancies:||1|
We are looking for a proactive, self-motivated and personable individual who will be responsible for maintaining a consistent and timely delivery of Incident and Problem management with best practices. You will be working with internal stakeholders, customers, 3rd party vendors and internal business units. You will own the end-to-end process to ensure all hit the agreed SLA’s.
In this exciting role you will be the central point of conflict and issue escalation to senior management and compile reports of incident and problems. You will need to analyse and report on patterns and trends to improve future service delivery and reduce major incidents. You will then need to take it a step further to ensure appropriate action is taken to anticipate, investigate and resolve any problems in systems and services that will be full documented. This can be done by regular audits, reviews and assessments.
The role will be responsible for managing production incidents and outage events as well managing problems within the Group Technology division. The role will provide leadership and coordination across infrastructure, application and partner teams to quickly remediate production issues and reduce mean time to resolution; as well as pushing for active problem records to be addressed and managed effectively so root causes are identified quickly with a plan to eliminate them clearly defined as part of the problem management processes with the Technology Operation and Product teams. Ensures appropriate managerial relationships are established and maintained to build and strengthen trust regarding end-to-end enterprise incident management resolution and enterprise problem management; serves as a focal point for escalation of issues to be resolved and for problems to be addressed. Facilitates ITIL standards adherence.
What you’ll be doing
Manage incidents and outages
· Manage the review, assignment and classifications of incidents, outages and problem cases
· Actively engage with operations teams and engineers, and manage the involvement of application development and other areas in the change and problem management process
· Create and review incident and problem management reports and identify action plans to improve key performance indicators as necessary
· Introduces key ITIL disciplines and practical project management techniques to ensure effective end to end problem management
· Ensure proper usage of incident, outage, problem and change management systems and processes
· Perform quality assurance on completed incident, outage, problem investigations and change management records
· Conduct Root Cause Analysis (RCA), Port Mortem and Problem Management meetings
· Ensure that root-cause is established for all major incidents and that a formal RCA is published within agreed SLAs
· Define reporting requirements needed in the management of the incident, outage and problem management processes
· Review incident, outage and problem processes, identify trends and recommend improvements
· Make recommendations for resolution and improvements to mitigate risk and prevent the replication of problems across systems
· Identifying and resolving Service Desk incident assignment issues.
· Managing exceptions of rejected incident records at a Service Delivery level.
· Resolving day-to-day incident coordination actions for Service Delivery.
· Incident Management Acting as a Service Delivery escalation point for day-to-day Incident Management process issues.
· Monitoring unassigned and reassigned incidents and taking action if appropriate.
· Handling day-to-day incident issues and escalating the Incident Resolver Groups as required to bring the resolution of the incidents back on schedule.
· Assisting in reassignment of misdirected incidents.
· Providing incident resolution status as requested.
· Validating incident severity if required, or assisting with correcting invalid incident severity.
· Ensuring the quality and accuracy of incident information, as appropriate.
· Process Review for Incident/ Problem Management and implement enhancements and document process.
· Perform other related duties as required and assigned
What you’ll need to succeed
· ITIL framework certification / ITIL v3 foundation certified
· Ability to manage an incident/outage bridge with 50+ technical and business stakeholders
· Ability to manage competing priorities and operate under pressure
· Ability to adjust schedule based on business need
· Ability to be proactive, takes action and anticipates opportunities
· Ability to guide and assist in technical troubleshooting during an incident/outage
· Excellent management, interpersonal, communication, presentation, and organizational skills
· The ability to lead cross functional teams effectively at all levels of the organization
· Coordination skills: managing (complex) IT technical investigations
· Competent in defining, documenting and managing procedures and processes
· Advanced knowledge of incident, outage, problem and change management
· Experience managing 24/7 Application, Infrastructure and/or Operation teams preferred
· Experience supporting Application and Infrastructure in AWS preferred
· Strong business acumen and ability to interface with executive management
· Must be able to work in fast paced environment.
· Adaptability to demanding circumstances that require timely and accurate responses
· Strong analytical, multitasking and prioritization skills
· Strong collaboration and partnering skills
· Excellent verbal and written communication skills with the ability to articulate complex ideas in easy-to-understand business terms to senior leaders