Wits is strategically located in Johannesburg, a world class city, with countless opportunities for students and staff to engage with and present solutions that will contribute to our country's knowledge-base and build our future. With its more than 130 000 graduates in its 91-year history, Wits has made and will continue to make its mark nationally a...
Read more about this company
Position A: Geoscience Associate Researcher (Bushveld Knowledge Curation & Geological QA)
- Supervisor: Prof. Rais Latypov
- Primary focus: Geological validation, controlled terminology/stratigraphic context, geochemical dataset curation, and quality assurance of extracted knowledge.
Position B: Computational Associate Researcher (OCR, Corpus Engineering, Data Pipelines, RAG Workflow and LLM for knowledge extraction)
- Supervisor: Prof. Glen T. Nwaila
- Primary focus: Corpus ingestion, OCR/text normalization, metadata standards, indexing/retrieval workflow, benchmarking, and reproducible pipeline implementation and front-end retrieval API.
Key Tasks and Responsibilities
Shared responsibilities (both positions):
- Contribute to building and maintaining the Bushveld corpus (peer¿reviewed papers, theses, technical reports, maps, datasets) with clear provenance.
- Implement quality¿control routines ensuring accuracy, consistency, and citation fidelity.
- Participate in regular review meetings and staged deliverables during the 12¿month project.
Position A: Geoscience AR (Bushveld curation & geological QA):
- Curate and validate geological content (stratigraphic units, lithologies, marker horizons, terminology).
- Curate and structure geochemical datasets including sample metadata and stratigraphic attribution.
- Develop and maintain a controlled vocabulary (“Bushveld ontology”).
- Evaluate RAG outputs and verify that statements are supported by retrieved sources.
Position B: Computational AR (corpus engineering, pipelines & RAG workflow):
- Design document processing workflows including OCR, text normalization, segmentation, and context preservation.
- Build and maintain metadata standards and reference¿management structures.
- Implement hybrid retrieval systems (keyword + vector search) with source/page traceability.
- Establish benchmarking and QA protocols for retrieval accuracy.
- Maintain modular and reproducible codebase using version control.
- LLM for language processing and knowledge graphs
Required Qualifications
Position A: Geoscience AR
- Degree in Geoscience/Geology (BSc Honours, MSc, or PhD in progress/completed).
- Training or interest in igneous petrology, layered intrusions, or economic geology.
- Strong attention to detail and ability to work independently.
Position B: Computational AR
- Degree in Computer Science, Computer Engineering, GeoData Science, Information Science, Geoinformatics/GIS, or related field.
- Ability in scientific data management and coding (Python).
- Knowledge of open-source LLMs such as Ollama.
- Strong organizational and reproducible workflow habits.
Desirable Skills
Position A: Geoscience AR
- Familiarity with the Bushveld Igneous Complex.
- Experience working with geochemical datasets and data validation.
- Reference management experience (Zotero, EndNote, etc.).
Position B: Computational AR
- Experience with OCR workflows and metadata schema design.
- Familiarity with SQL/databases, Git, semantic search or RAG systems.
- Interest in geoscience problems and willingness to learn Bushveld terminology.