Jobs Career Advice Post Job
X

Send this job to a friend

X

Did you notice an error or suspect this job is scam? Tell us.

  • Posted: Oct 22, 2025
    Deadline: Not specified
    • @gmail.com
    • @yahoo.com
    • @outlook.com
  • Never pay for any CBT, test or assessment as part of any recruitment process. When in doubt, contact us

    We help our customers to uncover and manage the valuable pieces of information, so that the people at every level in the organization can make decisions based on proven facts, rather than just gutfeel and emotion. We do this by using our extensive experience in the Business and Data fields, supported by leading software, methodologies and tools. We help y...
    Read more about this company

     

    Data Engineer (Hadoop Ecosystem)

    Job Description

    • We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within the Hadoop-based data platform. The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.

    Key Responsibilities:

    • Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
    • Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
    • Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
    • Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
    • Collaborate with Data Analysts, BI Developers and Business Users to understand data requirements and ensure datasets are production-ready.
    • Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
    • Maintain metadata, lineage and documentation for transparency and governance.
    • Monitor pipeline performance and implement error handling and alerting mechanisms.

    Technical Skills & Experience:

    • 3+ years of experience in data engineering or ETL development within a big data environment.
    • Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN and Sqoop.
    • Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
    • Experience working with unstructured JSON data and transforming it into structured formats.
    • Solid understanding of data lake architectures: Raw, Published, and Functional layers.
    • Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
    • Experience with schema design, data modeling, and partitioning strategies.
    • Comfortable with version control tools (e.g., Git) and CI/CD processes.

    Nice to Have:

    • Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
    • Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
    • Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
    • Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
       

    Check how your CV aligns with this job

    Method of Application

    Interested and qualified? Go to Praesignis on praesignisinternal.simplify.hr to apply

    Build your CV for free. Download in different templates.

  • Send your application

    View All Vacancies at Praesignis Back To Home

Subscribe to Job Alert

 

Join our happy subscribers

 
 
Send your application through

GmailGmail YahoomailYahoomail