San Diego Supercomputer Center, UC San Diego, San Diego, CA
Thanks to the Alumni Sponsored Internship Program, I was able to conduct research with Dr. Igor Tsigelny at UC San Diego. This bioinformatics internship has been very prosperous and exciting, giving me the opportunity to experience an unconventional type of work environment, improving my communication skills and standards, and providing me with immeasurably insightful knowledge in the field of bioinformatics research.
In Dr. Tsigelny’s lab at the UCSD Supercomputer Center, I performed a research project on the inflammatory bowel diseases. We first collected hundreds of possibly useful metabolites that were found to be related to the inflammatory bowel diseases—Crohn’s disease and ulcerative colitis—and then we filtered them by their attributes (primarily by p-value and fold change attributes) to form a concrete dataset with only the metabolites viable for our research purposes.
I learned how to use many data analysis programs that are prominent in the bioinformatics field: MetaboAnalyst, Ingenuity Pathway Analysis, PaDEL, eDRAGON, VisANT, and Waikato Environment for Knowledge Analysis (WEKA). With MetaboAnalyst, I was able to generate three figures depicting metabolic pathways for each of the three disease categories, and further, was able to determine with pathways played the most significant roles in the pathogenesis of the diseases. With Ingenuity Pathway Analysis, I was able to create ten extensive networks that showed the links between metabolites, proteins, and genes. By browsing through more public sources, I gathered lots of data on previous findings for these connections and related them to our project. With PaDEL, eDRAGON, and VisANT, I was able to retrieve hundreds of descriptors, or attributes, for each metabolite. This created a massive dataset with each of the hundreds of metabolites to have its own set of “defining” features (attributes) and thus allowed for their submission into WEKA.
I spent most of my time researching with the WEKA program, as I learned from scratch the concepts behind more than machine learning classifiers and then applied them to our data. My goal was to find a machine learning classifier that worked best with our datasets and could ultimately recognize patterns that would allow the program to create a model that would effectively differentiate uIBD, CD, and UC metabolic data from each other. This was the main purpose to our research project. After a month of learning and experimenting on WEKA, I finally narrowed down the datasets and machine learning classifiers that worked best together; the multilayer perceptron model appeared to be the most extensive and thorough classifier that also yielded the highest accuracy percentages. I’ve summed up all the research we have performed into a white paper that was submitted to Metabolomics Springer in August.
A big, big thank you to the Class of 1966 and the ’68 Center for Career Exploration for providing me this opportunity.