Karl Heyer
Machine Learning Engineer with a focus on drug discovery and bioscience
About
As a Machine Learning Engineer, I have successfully deployed high impact solutions to problems in drug discovery, RNA-Seq, small molecule design, and more. I leverage my background as a wet lab scientist to work effectively between developer teams and scientist teams, ensuring critical domain expertise is incorporated into ML solutions. Currently I work mainly in Python, PyTorch and the AWS stack.
Work Experience
Darkmatter AIRemote
Founder
Independent ContractorRemote
Machine Learning Engineer
NeumoraRemote
Data Scientist
BlackThorn Therapeutics (Acquired by Neumora)Remote
Data Scientist
Zymergen (Acquired by Ginkgo Bioworks)
Research Associate
Education
University of Southern California
University of Southern California
Skills
Projects
Vector Virtual Screen
A platform for using vector databases to accelerate virtual screening for drug discovery
CPU Embedding Server
A containerized FastAPI server for computing embeddings on CPU, optimized for concurrent requests
Roberta Zinc 480m
A model trained on 480m SMILES strings to compute meaningful molecule embeddings
GPT Zinc 87m
A GPT-2 style generative model trained on 480m SMILES strings to generate molecular compounds
Emb Opt
A python library for running hill climbing algorithms in embedding spaces, with a focus on searching vector databases
Emb Opt Server
A multi-container service for running search with the Emb Opt library, including a RESTful API server, backend database, job queue, and worker container
Chem Templates
A flexible and extensible python library for filtering large molecular libraries and defining chemical spaces
Chem Templates Server
A FastAPI inference server for the Chem Templates library with a focus on scalability and parallel processing
Molecular Reinforcement Learning
A python library for designing molecular compounds using a combination of machine learning and generative AI
In-House ML Library
A custom machine learning library for a scientist team with a focus on auto-ML and usability by lab scientists with little programming background
GB-GA Molecular Design
A production system for molecular design using graph based genetic algorithms (GB-GA) compatible with an arbitrary reward function
ML Inference Endpoint Templates
A template library to help scientists deploy custom ML models as docker containers using ECS and Sagemaker
Molecule Vector Database
Built a vector database system for large chemical libraries, including vector database backend, RESTful API query server, and batch process embedding computation
RNA-Seq Patient Subtyping
Used RNA-Seq analysis to identify patient subtypes in patient populations with mental health indicators
Autoscaling Docking Service
Built an autoscaling kubernetes service running CCDC molecular docking
Reinforcement Learning Molecular Design
Developed a system combining generative AI and reinforcement learning to design optimized molecules for multiple drug programs
ADMET Prediction Pipeline
Deployed a production system to predict ADMET properties of molecules using in-house laboratory data
Pooled DNA Assembly
A molecular biology technique for assembling DNA plasmids in a deterministic multiplexed fashion