Innovating to support each customer's success.
Combining deep content and technology to deliver better outcomes for our customers.
Advancing the rule of law around the world.
LexisNexis is a leading global provider of legal, regulatory and business information and analytics that help professional customers make more informed decisions, increase productivity...
Read more about this company
As a senior data engineer on our team, you will work on new product development in a small team environment writing production code in both run-time and build-time environments. You will help propose and build data-driven solutions for high-value customer problems by discovering, extracting, and modeling knowledge from large-scale natural language datasets. You will prototype new ideas, collaborating with other data scientists as well as product designers, data engineers, front-end developers, and a team of expert legal data annotators. You will get the experience of working in a start-up culture with the large datasets and many other resources of an established company.
You will also:
Build and scale data infrastructure that powers real-time data processing of billions of records in a streaming architecture
Build scalable data ingestion and machine learning inference pipelines
Build general-purpose APIs to deliver data science outputs to multiple business units
Scale up production systems to handle increased demand from new products, features, and users
Provide visibility into the health of our data platform (comprehensive view of data flow, resources usage, data lineage, etc) and optimize cloud costs
Automate and handle the life-cycle of the systems and platforms that process our data
Requirements
Masters in Software Engineering, Data Engineering, Computer Science or related field
2-3 years of relevant work experience
Strong Scala or Java background
Knowledge of AWS, GCP, Azure, or other cloud platform
Understanding of data modeling principles.
Ability to work with complex data models.
Experience with relational and NoSQL databases (e.g. Postgres, ElasticSearch/OpenSearch, graph databases such as Neptune or neo4j)
Experience with technologies that power analytics (Spark, Hadoop, Kafka, Docker, Kubernetes) or other distributed computing systems
Knowledge of API development and machine learning deployment