Forest Cover Type

August 2018 (GitHub)

A classification challenge from Kaggle that aims to predict the predomiant kind of tree cover in a section of a forest. Exploratory analysis is performed on the datset using visualization libraries Altair and Plotly. In addition an ensemble classifier model is used to predict the appropriate cover type.

PBS Newshour - Data Collection and Analysis

July 2018 (GitHub)

Collects all transcripts available on PBS Newshour's website. Topics includes:

  • Data is retrieved through the Python webscraping library, Beautiful Soup.
  • The data fetching is expediated through the Python threading library, Multiprocessing.
    • This is a much needed step as there are over 17 thousands unique URLs to scrap.
  • Preprocessing cleans the text and determines each text's speaker. In addition:
    • Basic typos/errors like "PRESDIENT" or "\xa0" are fixed
    • Qualifiers like "MAYOR" or (D-CA) are removed
  • The data is analysed using time series and word processing techniques

Mussel Watch - Exploratory Data Analysis

June 2018 (Kernel)

Looks through pollutation levels in our aquatic ecosystems. Topics include:

  • Grouping pollutants and organisms into major categories (i.e. mussels and clams become bivalves).
  • Looking at pollution levels for each substance over time.
  • Viewing unique Mussel Watch sites across the United States.

2015 US Census Estimate - Exploratory Data Analysis

April 2018 (Kernel)

An exloration of the 2015 American Community Survey. Topics include:

  • The unfortunate state of Puerto Rico in terms of poverty.
  • Transporation differences across counties and states.
  • Extremes in racial demographic populations.
  • The strange non-binomial behavior of gender in certain counties.

Music to sheet music converter

March 2018 (GitHub)

Takes a .wav music file and convert it into sheet music through the following steps:

  1. Segmentation: determines when each note starts in the file
  2. Filtering: cutting off low/high frequencies, determining whether two notes are a chord or quick sequential notes
  3. Pitch Detection: the pitch can be detected through Fast Fourier Transform and mapped onto a certain note
  4. Filtering: determining whether a frequency hit is a partial or a note, determining whether a frequency hit is an independent note or residual from a few secconds earlier
  5. Duration detction: finally determining how long a note lasts and it's location on a measure
  6. Saving to Music XML file: the information gathered is organized, formated, and saved into a music XML file. NOTE: additional software is needed to view the sheet music XML. One good (free) choice is MuseScore

Introduction to Data Science

I'm not seasoned enough to teach Data Science, but I can curate what I think are the most important resources for learning Data Science. Hopefully this page keeps you and me organized as we learn more!

Kaggle Mini Projects

Shelter Animal Outcomes

Digit Recognizer (MNIST)

Ghouls, Goblins, and Ghosts... Boo!

Titanic: Machine Learning from Disaster

House Prices: Advanced Regression Techniques

Research Presentations

These are files of presentations I gave during my undergraduate research.

In order of most recent

  • (Poster) Design and Development of Temperature Controller for Flux Growth. Download
  • (Presentation) Carbon Nanotubes and Desalination. Download
  • (Presentation) Flux Growth Development, NbSe3 in BNNT. Download

Publications

T. Pham, S. Oh, P. Stetz, S. Onishi, C. Kisielowski, M. Cohen, and A. Zettl, “Torsional Instability in the Single-Chain Limit of a Transition Metal Trichalcogenide.” Science, vol. 361, no. 6399, July 2018, pp. 263–266., science.sciencemag.org/content/361/6399/263.