Never pay for any CBT, test or assessment as part of any recruitment process. When in doubt, contact us
At DotModus, we offer the best in industry specific, bespoke big data solutions. View our Services and Products below or contact us today to find out how big data analytics technology is transforming your industry.
Dotmodus is a Google Cloud Partner in EMEA and specialises in helping our customers analyse their customers'​ data using Goog...
We value a Hadoop Infrastructure (Admin) Engineer as someone who works behind the scenes to deploy, manage and maintain the various methodologies and technologies of the Hadoop Ecosystem, to various consumers, in ways and forms that makes sense and add value. This definition is very broad, as this role falls into the data engineering field, which is just as broad.
You must be the type of individual that lives and breathes the admin / infrastructure side of the Hadoop Ecosystem, someone that can make fixes, enhancements, changes and deploy all those through multiple environments, therefore have the skills and experience in the configuration, deployment and/or administration.
You have the following technical competencies
- Hadoop 3 or Hive 3
- Responsibility for supporting, configuring, upgrading, and maintaining multiple Hadoop clusters.
- Configure and Deploy Apache Hadoop and other Apache components from scratch on VMs, Docker, and/or Kubernetes.
- Configure and Deploy Hiveserver LLAP on bare metal or Docker/K8
- Installation / Setup
- Yarn using the Capacity and Fair Scheduler
- Performance tuning and scaling Hiveserver LLAP using TEZ in a production environment
- HDFS HA
- Spark 2.4
- Spark Warehouse Connector
- Ranger 2.0
- Atlas 2.0
- Installation / Setup of Hadoop in Linux environment
- Deployment in a Hadoop cluster and its maintenance
- Health checks of a Hadoop cluster, monitoring whether it is up and running all the time
- Analyse the storage data volume and allocating the space in HDFS
- Resource management in a cluster environment
- Responsibility for supporting, configuring, upgrading, and maintaining multiple Hadoop clusters.
You have knowledge and/or experience with the following concepts
- Hadoop EcoSystem :
- Apache Hadoop
- Apache Spark
- Apache Hive
- Apache Zookeeper
- Apache Solr
Understanding Integration features :
- Apache Atlas
- Apache Ranger
- Apache Zeppelin
- Writing high-performance, reliable and maintainable modular code
- Data pipelining knowledge - data extraction and transformation
- Knowledge of the MapReduce and related data processing paradigms
- Hands on experience in HiveQL
- Hadoop development and implementation.
You have the following personal competencies
- The ability to solve problems
- The ability to rotate around a problem, to see if solutions can be gained in different ways
- The ability to work in an ever changing, unstructured environment
- The ability to work as part of a team, with vastly differing skill sets and opinions.
- The ability to contribute ideas to the quorum
- The ability to mentor and provide guidance for other team members
- A systems approach to thinking, as opposed to a siloed approach. The candidate needs to understand how their work affects the greater system
- The ability to work without supervision, and take accountability for the work they deliver
- The ability to liaise with a client, sifting through the fluff and extracting the actual requirements.