Karl Heyer
Machine Learning Engineer with a focus on drug discovery and bioscience
About
As a Machine Learning Engineer, I have successfully deployed high impact solutions to problems in drug discovery, small molecule design, and more. I leverage my background as a chemical engineer and lab scientist to work effectively between developer teams and scientist teams, ensuring critical domain expertise is incorporated into ML solutions. Currently I work mainly in Python, PyTorch and the AWS stack.
Work Experience
Darkmatter AIRemote
Founder
Machine Learning Engineer (Independent Contractor)Remote
Machine Learning Engineer
NeumoraRemote
Data Scientist
BlackThorn Therapeutics (Acquired by Neumora)Remote
Data Scientist
Zymergen (Acquired by Ginkgo Bioworks)
Research Associate
Education
University of Southern California
University of Southern California
Skills
Projects
Vector Virtual Screen
A platform for using vector databases to accelerate virtual screening for drug discovery
LLM RAG Pipeline
Built retrieval-augmented-generation pipeline for document search and document Q&A of client’s internal documents. Built on Llama3-70B-instruct and Quadrant vector database, deployed to Modal
ML Inference Endpoint Templates
Created custom template library to help scientists deploy ML models as a FastAPI app in a docker container served using ECS and Sagemaker
LLM for Document Summarization
Fine-tuned open source LLM (Mixtral 8x7B) using LoRA on client's proprietary document summarization corpus. Deployed inference endpoint on Modal using vLLM.
Virtual Screening Active Learning Server
Built active learning server managing online learning for multiple proxy functions based on data from expensive docking and FEP simulations, reducing cost and accelerating virtual screening of large molecular libraries
Molecule Vector Database
Built and deployed large-scale molecular search system handling 10^9 molecules, incorporating vector database backend, GPU-accelerated embedding computation, and RESTful API. Developed embedding compression methods to significantly reduce cost.Built a vector database system for large chemical libraries, including vector database backend, RESTful API query server, and batch process embedding computation
In-House ML Library
Created a custom AutoML framework enabling non-ML-expert scientists to build and deploy machine learning models for domain specific applications
GB-GA Molecular Design
A production system for molecular design using graph based genetic algorithms (GB-GA) compatible with an arbitrary reward function
CPU Embedding Server
A containerized FastAPI server for computing embeddings on CPU, optimized for concurrent requests
Roberta Zinc 480m
A model trained on 480m SMILES strings to compute meaningful molecule embeddings
GPT Zinc 87m
A GPT-2 style generative model trained on 480m SMILES strings to generate molecular compounds
Emb Opt
A python library for running hill climbing algorithms in embedding spaces, with a focus on searching vector databases
Emb Opt Server
A multi-container service for running search with the Emb Opt library, including a RESTful API server, backend database, job queue, and worker container
Chem Templates
A flexible and extensible python library for filtering large molecular libraries and defining chemical spaces
Chem Templates Server
A FastAPI inference server for the Chem Templates library with a focus on scalability and parallel processing
Molecular Reinforcement Learning
A python library for designing molecular compounds using a combination of machine learning and generative AI
RNA-Seq Patient Subtyping
Used RNA-Seq analysis to identify patient subtypes in patient populations with mental health indicators
Autoscaling Docking Service
Built an autoscaling kubernetes service running CCDC molecular docking
Reinforcement Learning Molecular Design
Developed a system combining generative AI and reinforcement learning to design optimized molecules for multiple drug programs
ADMET Prediction Pipeline
Deployed a production system to predict ADMET properties of molecules using in-house laboratory data
Pooled DNA Assembly
A molecular biology technique for assembling DNA plasmids in a deterministic multiplexed fashion