Data Engineer - Databricks (gn)
For our client, a globally active asset management firm with a strong presence across institutional and private client segments, we are currently looking for a Data Engineer - Databricks (gn) to strengthen the team.
Purpose of job
The role is responsible for designing, building and operating data products and reusable data preparation components on a Databricks-based Data and AI Platform, while acting as a technical enabler for internal platform users through expert guidance and advanced-level support. The position ensures adherence to security, privacy and regulatory requirements via a compliance-by-design approach, maintains alignment with established best practices for data pipelines and data quality, and drives continuous platform improvement through the integration of new features and standardized, reusable pipelines.
Tasks
* Develop and maintain a library of modular, reusable pipeline components — covering data intake, processing, verification and enrichment — to produce consistently structured, AI/ML-ready datasets from both structured and unstructured sources.
* Architect and run dependable data workflows on Databricks, pulling from a wide range of internal and external origins to produce clean, validated data assets available for AI, ML and reporting purposes.
* Govern the lifecycle of layered data assets across maturity tiers, upholding quality and timeliness standards while maintaining purpose-specific, analytics- and model-ready output datasets.
* Build and maintain transformation workflows that derive meaningful predictive attributes from raw data, alongside a centrally managed attribute repository with clear versioning, ownership and service-level commitments.
* Work alongside AI and ML engineers to establish and maintain data supply chains for retrieval-augmented generation systems, covering content segmentation, vector representation updates and index synchronization.
* Embed governance controls — covering permissions, traceability, data lifecycle management, encryption and audit trails — into platform design to meet both internal policies and external regulatory requirements.
Required:
* At least 3 years of hands-on experience designing and running large-scale data pipelines and data products on Databricks — batch and/or streaming — preferably in regulated or governance-heavy environments (ideally in the Financial Services industry)
* Advanced proficiency in Databricks, Spark, Python and SQL, complemented by sound software engineering practices such as CI/CD workflows and infrastructure-as-code tooling (e.g., Terraform) on Azure.
* Solid grasp of data engineering principles relevant to AI/ML contexts, including data modelling, quality assurance, feature collaboration and reproducibility, as well as an understanding of how data characteristics influence model behaviour.
* Strong command of modern data architecture patterns — including Lakehouse principles, layered data organisation and data product thinking — and the ability to put them into practice within a governed enterprise setting.
* Degree in computer science or a comparable discipline.
* Strong analytical mindset paired with structured planning, clear documentation and the ability to effectively transfer knowledge to colleagues.
Preferred:
* Professional background in the financial sector, ideally within asset management, combined with international work experience and relevant Databricks certifications.