My Portfolio

Take a look at my recent work! Click on any project to learn more.

February 20, 2024

Dog Breed Prediction Application (Tensorflow, Kubernetes, AWS)

End-to-end deep learning project for dog breed image classification. Exploratory data analysis (EDA), transfer learning with keras, fine-tuning, and model deployment with flask and docker to Amazon Web Services (AWS) Lambda and Elastic Kubernetes Service (EKS), application deployment to Streamlit.

December 20, 2023

Credit Prediction Classifier (Sklearn, Docker, Flask)

Credit prediction modeling project using scikit-learn decision tree, KNN, SVM and neural network models to forecast credit default likelihood. Data preparation, EDA, model training and fine-tuning. Deployment with SVM utilizing flask connection and environment containerization with Docker.

July 31, 2023

Graph Neural Network for Protein-Protein Interactions (PyTorch)

Built a GNN architecture using PyTorch Geometric for node classification of the protein-protein interaction (PPI) dataset. Used a combination of Molecular Fingerprint Convolution (MFConv) and Graph Isomorphism Network (GIN) layers, resulting in an F1 score within 5 points of the benchmark while reducing runtime by 67%.

April 21, 2023

Yelp Fake Review Analysis (Sklearn, D3.js)

Tested SVM, KNN, naive bayes, logistic regression, and random forest classifiers to predict the presence of fake reviews in the Kaggle Yelp dataset. Used TF-IDF for sentiment analysis and oversampling with SMOTE, noting a 12% increase in recall. Developed an interactive visualization with D3 as an analytical tool to uncover sectors negatively impacted by fake reviews in suburban populations.

March 8, 2023

Comparing the effects of dataset features on machine learning classifiers (Sklearn)

Researched the comparative strengths of decision trees, KNN, SVM and neural networks using validation curves, learning curves, wall-clock time and loss curves. Compared tuned performance for the bioresponse dataset, a wide, balanced binary classification problem, to the letter dataset, a long and unbalanced multilabel problem.

ABOUT



Since you're here, let's start with a few interesting things about me. I love listening to music and sharing good tunes with friends. You can usually catch me jamming out to electronic, RnB, post-hardcore or J-pop. In my free time, I'm an avid gamer and anime enthusiast (anything shonen is my favorite!). And last but not least, I'm genuinely obsessed with learning. Every day, I challenge myself to explore new horizons, whether it's immersing myself in the latest cloud technologies, diving into the ever-evolving world of machine learning, or even just geeking out over facts about how our planet works.

I'm a motivated graduate student with a passion for data and a proven track record of delivering impactful solutions. You'll often find me captivated by the smallest details, building meaningful visualizations, and optimizing analytics to empower research. My interest in data began in my undergraduate career, where I built a strong foundation in analytics through my studies in chemistry, geology, and ecology. Towards the end of my degree, I took a computer science course on a whim and was immediately hooked. Ever since, I've been studying it in my spare time.

What started as a hobby led into positions as an analyst in soil logistics, then as a scientist in the data solutions department of an environmental consulting firm. From there, I decided to pursue a Masters of computer science at Georgia Tech to quench my thirst for knowledge and develop a formal education in the field. The rest is history!