We are seeking an experienced data professional to join our organization. This role involves designing and implementing robust data pipelines and storage solutions in big data environments.
Key Responsibilities:
* Design, develop, and optimize data pipelines using Apache Spark (Python and/or Scala)
* Process large-scale batch and streaming datasets with high efficiency
* Collaborate with data scientists and engineers to ensure seamless integration of data sources
* Implement data quality checks, testing, and monitoring processes to ensure data accuracy and reliability
* Contribute to the development of CI/CD and automation best practices to improve data processing workflows
* Manage and organize data in on-prem object storage solutions for efficient data retrieval and integration
* Promote data governance awareness and best practices throughout the organization
Required Skills and Qualifications:
* Bachelor's or Master's degree in Computer Science, Engineering, or a related field
* Minimum 2-5 years of experience as a data engineer in big data environments
* Strong skills in Apache Spark (Python and/or Scala), SQL, and data integration tools
* Proficiency in Git, Airflow, and CI/CD pipeline management
* Experience with REST APIs and object storage solutions (S3/MinIO)
* Awareness of data governance topics, including data lineage, metadata, PII, and data contracts
* Fluent communication skills in English and French (minimum B2 level)