Zillow-House-Research — Exploratory study of the real estate market in major silicon valley cities. Scraped Zillow HTML, processed results by Apache Hadoop/MapReduce, crated D3.js-style visualization by Bokeh, and built app in Flask.
Click-Through Rate Prediction — Used feature hashing to process 45 million rows of data. Reduced memory use by 50% and improved processing speed by 200% as compared with one-hot encoding. Implemented a modified version of online learning method (Per-Coordinate FTRL-Proximal) proposed by a paper through python. Improved training speed by 300% and increased prediction accuracy by 30% as compared with all existing traditional ML models (Boosting, SVM, Logistic, etc ).
[FlyModeller] — Interactive machine learning platform for automatic fly behavior annotation. Built visualization tools by Qt5 and OpenCV, optimized performance by Cython, implemented machine learning features by scikit-learn and autoWeka.
GxEScanUtilities — crated utilities (SNP extracter and dosage file splitter) for GxEscan, a GWAS scan tool to detect GxE interaction.