Black Swan creates better outcomes through technology, prediction and data science, transforming how brands engage with their consumers.
Our products transform the way brands generate value from data, finding insights and solutions that create a competitive advantage for their business through accurate prediction of trends in consumer behaviour. We analyse...
Read more about this company
The Principal Data Engineer will play a critical role in designing, developing, and optimizing our data infrastructure, pipelines, and processing systems. This role requires strong technical expertise in Scala, extensive experience building and maintaining data pipelines, and a deep understanding of NLP techniques. The successful candidate will work closely with data scientists, analysts, and other stakeholders to ensure the seamless integration and accessibility of data across the organization.
Key Responsibilities
Data Infrastructure and Pipeline Development
Design, build, and maintain scalable and reliable data pipelines to ingest, process, and store large volumes of structured and unstructured data.
Develop and optimize data processing systems using Scala and other programming languages, as required.
Implement and manage data storage solutions, ensuring data integrity, security, and accessibility.
NLP and Advanced Analytics
Collaborate with data scientists to develop and implement NLP models and algorithms for text analysis, sentiment analysis, and other advanced analytics use cases.
Optimize NLP pipelines for performance, scalability, and maintainability.
Stay current with the latest research and developments in NLP and incorporate new techniques as appropriate.
Data Quality and Governance
Establish and enforce data quality standards, data governance policies, and best practices.
Implement data validation, monitoring, and anomaly detection mechanisms to ensure data accuracy and consistency.
Work closely with cross-functional teams to address data quality issues and drive continuous improvement.
Technical Leadership and Collaboration
Serve as a subject matter expert on Scala, data pipelines, and NLP within the organization.
Mentor and guide junior data engineers in best practices, technical skills, and career development.
Collaborate with data scientists, analysts, and other stakeholders to define requirements, provide technical guidance, and support data-driven decision-making.
Requirements
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).
Extensive experience in data engineering, with a strong focus on Scala, data pipelines, and NLP.
Deep understanding of distributed data processing frameworks (e.g., Apache Spark, Flink, Hadoop) and database technologies (e.g., SQL, NoSQL, columnar storage).
Experience with cloud platforms and services (e.g., AWS, GCP, Azure) and containerization (e.g., Docker, Kubernetes).
Knowledge of NLP libraries and tools (e.g., spaCy, NLTK, Gensim) and machine learning frameworks (e.g., TensorFlow, PyTorch).
Strong problem-solving, analytical, and communication skills.
Preferred Qualifications
Master's or Ph.D. in a related field.
Experience with other programming languages (e.g., Python, Java) and big data technologies (e.g., Kafka, HBase, Cassandra).