Data Engineer

MISSION

To design, build, and operate robust, scalable data pipelines that transform raw OCR and AI-generated outputs into clean, validated, and structured datasets that power the platform’s core features. This role ensures that high-volume data flows (from ingestion through transformation to delivery) are reliable, performant, and seamlessly integrated into downstream applications.

KEY CRITERIA/REQUIREMENT

Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Software Engineering, or a related technical field
Master’s degree in Data Science, Computer Science, or a related discipline is a plus
Strong experience with Python for data processing and pipeline development
Hands-on experience with ETL/ELT pipelines and/or streaming data systems
Familiarity with Kafka, Spark, Flink, or similar distributed processing frameworks
Understanding of cloud-based data services and modern data storage patterns
Experience working with SQL and NoSQL databases (e.g., PostgreSQL, MongoDB, Cassandra)
Integrity and confidentiality in handling sensitive data
A verifiable clean background with the utmost reputation of good character and integrity

DUTIES

Design, build, and maintain robust data pipelines for batch and streaming workloads
Develop Python-based data processing components for ingestion, transformation, and validation
Integrate OCR and AI model outputs into structured, high-quality datasets
Implement data validation logic, transformation rules, and data models aligned with business requirements
Build and expose APIs or data services to deliver processed data to application layers
Work with streaming and distributed data frameworks (e.g., Kafka, Spark, Flink)
Collaborate with architects and full-stack engineers to ensure smooth data-to-application interfaces
Monitor, optimize, and troubleshoot data workflows for performance, scalability, and reliability
Support cloud-based data storage and processing patterns across environments

Tagged job