Brian Lu

High Availability

Home

I'm a Computer Engineering student at Purdue University Main Campus.

In the past I wrote and maintained navigation and scheduling software for competitive robotics, wrote parsers for the ingest of structured data of 1,000s of chemical Safety Data Sheet PDFs for Merck, and optimized DevOps workflows for Lilly.

Now, I'm working with Merck to use Machine Learning and Graph Knowledge Bases together to analyze laboratory data.

I have a passion for homelabbing, and own and operate the Shamrock Cluster.

It tries to be as production-ready as possible by leveraging high-availability technologies, filesystem encryption, and ensuring mutual TLS authentication every step of the way.

Experience

  1. Merck Sharp & Dohme

    Undergraduate Researcher

    • Designing a Visual Document Understanding (VDU) solution for use in converting scanned lab documents to JSON


    • Performing transfer learning for the Naver Clova AI Donut Swin Transformer/BART model on synthetic documents


    • Generating synthetic documents for training using Python, ReportLab, and Dask that model real-world characteristics


    • Restored tools for the 1994 UniPen on-line handwriting dataset from SunOS 5 to Linux Flatpak


    • Writing tools to generate commercially-viable synthetic handwriting for use in synthetic document generation


    • Designing RDF graph data storage system based on oxigraph in Rust for storage of ingested documents

  2. Merck Logo
  3. The Data Mine @ Purdue University

    Software Engineering Contractor - Eli Lilly & Company

    • Developed automated documentation generation reconciling with Confluence, replacing legacy system


    • Optimized Apache Airflow image building operations on both Python Poetry and Docker build steps


    • Replicated production deployment on k3d to simulate and resolve Helm deployment errors

  4. Lilly Logo
  5. Merck Sharp & Dohme

    Student Researcher

    • Architected Safety Data Sheet (SDS) parsing and insights software with team


    • Software cuts down on hours of work per day reading SDS documents, is currently being integrated


    • Wrote low-level PDF parsing engine to extract document hierarchies, fields, and images


    • Created OpenCV-based classifier to accurately detect GHS pictograms


    • Developed automated PDF templating pipeline to create reports of generated insights

  6. Merck Logo
  7. High Oak Robotics

    Robotics Coach

    • Wrote and maintained Sequoia asynchronous scheduling library


    • Teaching of robotics control system concepts to new team members


    • Bootstrapped sensor fusion and navigation technology stack

  8. High Oak Robotics Logo