Data Engineer
MISSION
To design, build, and operate robust, scalable data pipelines that transform raw OCR and AI-generated outputs into clean, validated, and structured datasets that power the platform’s core features. This role ensures that high-volume data flows (from ingestion through transformation to delivery) are reliable, performant, and seamlessly integrated into downstream applications.
KEY CRITERIA/REQUIREMENT
- Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Software Engineering, or a related technical field
- Master’s degree in Data Science, Computer Science, or a related discipline is a plus
- Strong experience with Python for data processing and pipeline development
- Hands-on experience with ETL/ELT pipelines and/or streaming data systems
- Familiarity with Kafka, Spark, Flink, or similar distributed processing frameworks
- Understanding of cloud-based data services and modern data storage patterns
- Experience working with SQL and NoSQL databases (e.g., PostgreSQL, MongoDB, Cassandra)
- Integrity and confidentiality in handling sensitive data
- A verifiable clean background with the utmost reputation of good character and integrity
DUTIES
- Design, build, and maintain robust data pipelines for batch and streaming workloads
- Develop Python-based data processing components for ingestion, transformation, and validation
- Integrate OCR and AI model outputs into structured, high-quality datasets
- Implement data validation logic, transformation rules, and data models aligned with business requirements
- Build and expose APIs or data services to deliver processed data to application layers
- Work with streaming and distributed data frameworks (e.g., Kafka, Spark, Flink)
- Collaborate with architects and full-stack engineers to ensure smooth data-to-application interfaces
- Monitor, optimize, and troubleshoot data workflows for performance, scalability, and reliability
- Support cloud-based data storage and processing patterns across environments
Tagged job