Senior Software Engineer - Observability & SDX Maintenance
Verta
Business Area:
EngineeringSeniority Level:
Mid-Senior levelJob Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
Cloudera is seeking a Senior Software Engineer to join our Bangalore-based SDX Observability Infrastructure team. This is a growth opportunity for a technically proficient engineer who enjoys the challenge of building next-generation telemetry systems while ensuring the continued reliability of mission-critical data lake services.
You will be a key individual contributor in a high-performing team. Your mission is twofold: leading the technical expansion of our cross-service Observability & Metrics infrastructure (75% focus) and acting as the technical steward for the SDX Backup and Restore (BDR) framework through maintenance and customer escalation support (25% focus).
This role offers the opportunity to work on the "nervous system" of the Cloudera Data Platform. You will gain deep experience in how large-scale enterprise data clouds are monitored and protected, working on a team that values technical excellence and collaborative problem-solving.
As a Senior Software Engineer, you will:
Work on large-scale, distributed clusters to build and extend the core infrastructure required to collect and aggregate high-cardinality metrics across all Cloudera services.
Implement and refine instrumentation libraries and collector configurations (OTel) to standardize telemetry data across the CDP stack
Research and integrate AI/ML tools to automate management tasks, such as intelligent metric collection tuning, anomaly detection, and predictive scaling of telemetry pipelines.
Write design documentation for key features and capabilities
Improve code quality through writing tests, automation, and code reviews
Own small projects maintaining the existing Java codebase for Data Lake Backup and Restore, and providing essential bug fixes, security patches, and minor enhancements to keep the BDR framework robust and enterprise-ready.
We are excited about you if you have:
Bachelor’s or Masters Degree in Computer Science or equivalent, and 5+ years of software development experience
Expert proficiency in Java and/or Go
Experience working observability or other metrics or streaming infrastructure
Familiarity with cloud storage primitives (AWS S3, Azure ABFS, or GCS).
Experience navigating complex distributed systems to resolve high-pressure customer situations.
Experience with Kubernetes and containers
Strong oral and written communication skills in English
A team-first mindset with the ability to take a high-level design and run with the implementation to completion.
You may also have:
Practical experience working with Prometheus, Grafana, and the OpenTelemetry (OTel) ecosystem.
Experience with Python, Bash, SQL, PromQL
Knowledge and experience with AI/ML
Recognized contribution to open source projects
Collaborative Execution: A team-first mindset with the ability to take a high-level design and run with the implementation to completion.
Experience or a strong interest in applying Machine Learning to operational data (e.g., using AIOps for log pattern recognition or metric threshold tuning).
What you can expect from us:
Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Employee Resource Groups
EEO/VEVRAA
#LI-SV1