Operations Engineer II (Service & Network Operations Center) (Contract)
At Quickplay we believe in transparency, fairness, and collaboration while we passionately work on some of the toughest use cases in OTT video, focused on massive scale and resilience. If you aspire to be part of a high-performing, learning-oriented, and caring culture--you have landed on the right company. Reporting to our Network Operations Leader in San Diego, the operations team is looking for someone with attention to detail, who is eager to expand their knowledge in a fast-paced environment where change is constant, and communication is key. The mission of this group is to provide 24x7x365 operations support, where information and updates must be concise at handoff.
About The Role--
Primarily Focused on:
- Responding to incidents submitted by customers and internal staff using multiple mediums and methods based on time critical and status changes.
- Following defined processes to proactively check the health and welfare of applications and infrastructure.
- Constant eyes on the glass monitoring. Visually and spatially correlating multiple monitoring systems and application components, responding to alarms and traps.
- Determining the severity and urgency of an incident based on service level agreements and taking immediate action to restore service or escalate as appropriate, while prioritizing multiple simultaneous tasks.
- Initiating multiple simultaneous internal and external escalation paths for 2nd and 3rd tier support when required.
- Be a participant in the centralized communications structure for Customer notifications and updates, requesting involvement of the customer, second and third tier support organization when appropriate.
- Serving as the focal point for status information on the progress of the resolution effort, with a focus on precise, detail oriented, and timely communications.
- Directing multi-resource communications bridges for coordination when severity or impact warrants.
- Compiling all event history and preserving it for conducting a thorough root cause analysis or post incident reporting.
- Protecting our end users' experience by using initiative and sound judgement in the application of several ITIL based processes and procedures.
- Quickplay staff reports into offices in a hybrid capacity (i.e partially at home, and partially at the office based on role/team needs) leveraging safety protocols aligned with local public health guidelines as they relate to COVID-19.
Success in the role requires:
- Coordination in a multi-geography 24 x7 Operations Team.
- Ability to adapt and learn newer technologies and processes.
- Identifying potential weakness in active systems and their monitoring architectures.
- Listen, understand, assimilate and document clearly all Incidents, lesson learned and knowledge sharing sessions.
- Contribution towards goals set as a team and as well as an individual level.
- Excellent communication skills for coordinating between multiple vendors, customers, peers, and other team members in the organization.
You continually desire to stay curious, speak up, focus on impact, and be supportive. These four specific core principles are critical to your success here at Quickplay, and we understand that when you succeed--Quickplay succeeds.
- Preferred Bachelor of Science degree in Computer Engineering, Computer Science, Applied Science, Electrical Engineering, or Math; Developer nanodegree; or equivalent experience.
- 1 year of similar job related experience.
- Basic understanding of Linux OS, networking, network storage, apache web servers, and network protocols such as DNS, FTPS, FTP, RSS, and HTTP.
- Excellent analytical, communication and documentation skills.
- Self-Starter able to work in a fast-paced environment and work under pressure; capable of reacting quickly to problems and implementing solutions.
- Experience with monitoring systems such as Zabbix, New Relic, Grafana, IneoQuest.
- Experience with Kubernetes, Azure, Cloudera, Google Cloud Computing.
- Candidates must be well versed in Microsoft Office or Google Suite Tools, ServiceNow, Jira, Confluence.
- High emotional intelligence and low ego.
- Strong ability to prioritize dynamic tasks received during the workday.
- Flexibility in work schedule.