LexisNexis Legal & Professional is a leading global provider of legal, regulatory and business information and analytics that help customers increase productivity, improve decision-making and outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis® ...
Read more about this company
A Data Scientist III should understand best practices and processes and be able to execute them independently, reviewing requirements and results with supervisors. Individuals in this role should work with managers to define project scope, and then execute the necessary methods to develop, test and deliver outcomes. In this role, you will design and improve NLP and retrieval systems that power legal research and AI-assisted workflows. You will contribute directly to embedding-based search, retrieval-augmented generation (RAG), and large language model (LLM) quality improvements, with a strong focus on evaluation, relevance, and measurable impact. Collaboration is central: you will work together with Software Engineers and AI Platform teams to ensure scalable, reliable, and high-quality solutions.
Responsibilities:
NLP pipelines for legal text ingestion and processing
Embedding-based retrieval and search relevance
Improving response quality in RAG/LLM systems
Model evaluation and quality measurement
Collaborating closely with Software Engineers and AI Platform teams
Primarily provide support to more senior Data Scientists
Pulls data, cleans data, builds and maintains existing models
Participates in result presentation to internal stakeholders
Requirements:
At least an undergraduate degree in relevant field and 2+ years of relevant work experience Or a master's degree in a relevant field
At least 2+ years’ experience in any of the roles (Data engineer, Data analyst, Machine learning engineer)
Python (2 years exp.)
Feature engineering, model training, evaluation, and error analysis
Formal training in machine learning: dimensionality reduction, clustering, embeddings, and sequence classification algorithms
Practical experience in Natural Language Processing methods and libraries such as spaCy, word2vec, TensorFlow, Keras, PyTorch, Flair, BERT, large language models and prompt engineering.
Strong Python, Scala or Java background
Knowledge of relational and NoSQL databases (e.g. Postgres, Elasticsearch/ OpenSearch, AWS Neptune)