Karl Heyer

Machine Learning Engineer with a focus on drug discovery and bioscience

About

As a Machine Learning Engineer, I have successfully deployed high impact solutions to problems in drug discovery, small molecule design, and more. I leverage my background as a chemical engineer and lab scientist to work effectively between developer teams and scientist teams, ensuring critical domain expertise is incorporated into ML solutions. Currently I work mainly in Python, PyTorch and the AWS stack.

Work Experience

Darkmatter AI
Remote

2022 - Present

Founder

Developed open source python libraries applying machine learning and generative AI to problems in cheminformatics and drug design. Technologies: PyTorch, Generative AI, LLMs, RDKit, FastAPI, Docker

Machine Learning Engineer (Independent Contractor)
Remote

2022 - 2023

Machine Learning Engineer

Contract machine learning engineer helping clients deploy robust AI/ML solutions to their business problems. Technologies: PyTorch, Python, RDKit, RabbitMQ, Docker, Kubernetes, SQL, MongoDB, AWS (EC2, Lambda, Sagemaker, S3), Modal

Neumora
Remote

2021 - 2021

Data Scientist

Analyzed RNA-Seq data to derive insights for patient subtyping, drug selection and improved treatment outcomes for mental health therapeutics. Technologies: R, Bioconductor

BlackThorn Therapeutics (Acquired by Neumora)
Remote

2020 - 2021

Data Scientist

Led AI/ML initiatives for Blackthorn’s drug discovery team, developing end-to-end systems for ADMET prediction, molecular docking and generative modeling using deep learning and reinforcement learning. Architected and supervised implementation of cloud infrastructure, managing two data engineers to build ETL pipelines integrating lab data with AWS Redshift. Technologies: Python, PyTorch, RDKit, Docker, Kubernetes, AWS (EC2, Sagemaker, S3, Redshift)

Zymergen (Acquired by Ginkgo Bioworks)

2017 - 2019

Research Associate

As a member of the R&D team, I helped develop, scale, and productionize novel molecular biology methods for large scale pooled DNA assembly and high throughput bioinformatics analysis pipelines. I worked across the full scale range from microliter bench top scale to factory production. Responsibilities included working across departments to transfer complex protocols and oversee pilot runs of novel R&D methods at industrial scale.

Education

University of Southern California

2016 - 2017

Master's Degree in Chemical Engineering and Materials Science (GPA 3.85)

University of Southern California

2012 - 2016

Bachelor's Degree in Chemical Engineering (GPA 3.9)

Skills

Python

PyTorch

Numpy

scikit-learn

HuggingFace

LLMs

Generative AI

Vector Databases

Diffusion Models

PostgreSQL

MongoDB

Docker

Kubernetes

Redis

FastAPI

AWS

RDKit

Schrodinger

CCDC

Autodock Vina

Projects

Vector Virtual Screen

A platform for using vector databases to accelerate virtual screening for drug discovery

PyTorch

Qdrant

MongoDB

Redis

Docker

Python

FastAPI

LLM RAG Pipeline

Built retrieval-augmented-generation pipeline for document search and document Q&A of client’s internal documents. Built on Llama3-70B-instruct and Quadrant vector database, deployed to Modal

Contract

Python

Pytorch

vLLM

Modal

RAG

ML Inference Endpoint Templates

Created custom template library to help scientists deploy ML models as a FastAPI app in a docker container served using ECS and Sagemaker

Contract

Python

FastAPI

Docker

LLM for Document Summarization

Fine-tuned open source LLM (Mixtral 8x7B) using LoRA on client's proprietary document summarization corpus. Deployed inference endpoint on Modal using vLLM.

Contract

Python

Pytorch

vLLM

LoRA

Modal

Virtual Screening Active Learning Server

Built active learning server managing online learning for multiple proxy functions based on data from expensive docking and FEP simulations, reducing cost and accelerating virtual screening of large molecular libraries

Contract

Python

Pytorch

Docker

FastAPI

Molecule Vector Database

Built and deployed large-scale molecular search system handling 10^9 molecules, incorporating vector database backend, GPU-accelerated embedding computation, and RESTful API. Developed embedding compression methods to significantly reduce cost.Built a vector database system for large chemical libraries, including vector database backend, RESTful API query server, and batch process embedding computation

Contract

Pytorch

Qdrant

FastAPI

In-House ML Library

Created a custom AutoML framework enabling non-ML-expert scientists to build and deploy machine learning models for domain specific applications

Contract

Python

Pytorch

scikit-learn

GB-GA Molecular Design

A production system for molecular design using graph based genetic algorithms (GB-GA) compatible with an arbitrary reward function

Contract

Python

RDKit

Docker

CPU Embedding Server

A containerized FastAPI server for computing embeddings on CPU, optimized for concurrent requests

Python

FastAPI

Pytorch

HuggingFace

Docker

Roberta Zinc 480m

A model trained on 480m SMILES strings to compute meaningful molecule embeddings

Pytorch

HuggingFace

GPT Zinc 87m

A GPT-2 style generative model trained on 480m SMILES strings to generate molecular compounds

Pytorch

HuggingFace

Emb Opt

A python library for running hill climbing algorithms in embedding spaces, with a focus on searching vector databases

Python

Numpy

Emb Opt Server

A multi-container service for running search with the Emb Opt library, including a RESTful API server, backend database, job queue, and worker container

FastAPI

Docker

MongoDB

Redis

Chem Templates

A flexible and extensible python library for filtering large molecular libraries and defining chemical spaces

Python

RDKit

Chem Templates Server

A FastAPI inference server for the Chem Templates library with a focus on scalability and parallel processing

FastAPI

Docker

MongoDB

Molecular Reinforcement Learning

A python library for designing molecular compounds using a combination of machine learning and generative AI

Python

Pytorch

RDKit

RNA-Seq Patient Subtyping

Used RNA-Seq analysis to identify patient subtypes in patient populations with mental health indicators

Neumora

Bioconductor

Autoscaling Docking Service

Built an autoscaling kubernetes service running CCDC molecular docking

BTRX

Python

Docker

Kubernetes

RabbitMQ

CCDC

Reinforcement Learning Molecular Design

Developed a system combining generative AI and reinforcement learning to design optimized molecules for multiple drug programs

BTRX

Pytorch

RDKit

ADMET Prediction Pipeline

Deployed a production system to predict ADMET properties of molecules using in-house laboratory data

BTRX

Pytorch

RDKit

Pooled DNA Assembly

A molecular biology technique for assembling DNA plasmids in a deterministic multiplexed fashion

Zymergen

Press ⌘J to open the command menu

Karl Heyer

About

Work Experience

Darkmatter AIRemote

Founder

Machine Learning Engineer (Independent Contractor)Remote

Machine Learning Engineer

NeumoraRemote

Data Scientist

BlackThorn Therapeutics (Acquired by Neumora)Remote

Data Scientist

Research Associate

Education

University of Southern California

University of Southern California

Skills

Projects

Vector Virtual Screen

LLM RAG Pipeline

ML Inference Endpoint Templates

LLM for Document Summarization

Virtual Screening Active Learning Server

Molecule Vector Database

In-House ML Library

GB-GA Molecular Design

RNA-Seq Patient Subtyping

Autoscaling Docking Service

Reinforcement Learning Molecular Design

ADMET Prediction Pipeline

Darkmatter AI
Remote

Machine Learning Engineer (Independent Contractor)
Remote

Neumora
Remote

BlackThorn Therapeutics (Acquired by Neumora)
Remote