Like many wet lab biologists, I was worried that my productivity would decrease and that I’d run out of work to do while working from home during the COVID-19 pandemic. I’m a second-year PhD student in an epigenetics lab, where we study how and when genes are switched on and off during embryonic development, and how to treat diseases that are caused when this process is disrupted. A lot of the experiments we do to try to answer these questions involve genomics and next-generation sequencing. Some of the people in our lab work in the wet lab, for example by culturing cells and extracting DNA or RNA for sequencing. Conversely, others predominantly or exclusively work in the dry lab as bioinformaticians, analysing the genomic data generated by people working in the wet lab.
Although my degrees, past experiences in other labs and PhD project have been heavily focused on doing experiments in the wet lab, I wanted to learn some basics of genomic data analysis so that I could visualise and understand the sequencing data I’ve been generating in the lab. Luckily, I managed to finish off some sequencing experiments in the lab before working from home during the COVID-19 pandemic, so I decided to take this opportunity to develop my genomic data analysis skills.
The thing I found hardest at first was learning programming. I had no experience in coding prior to my PhD and found that this was a huge hurdle to overcome because there is a threshold level of coding knowledge required to be able to even start looking at your data. The first thing to do when learning programming is to identify which language is best for running the packages required for your analysis and to start learning the basics of that language. Bash was that language for my data, however other languages that are often used in bioinformatics are R and Python, for which there are many online resources which take you through the basics without being too overwhelming.
Fortunately, there are programs which have been developed to visualise your data with only a small amount of programming required. One program which I have found really useful and intuitive is SeqMonk. SeqMonk was developed for wet lab biologists to visualise and analyse mapped sequencing data. Mapping sequencing data is comparing each read of sequenced DNA in a sample to the reference genome (known sequences of genes and chromosomes) of the species the sample is from. This can be done in Bash then read in SeqMonk. You can then visualise where the reads map to the reference genome and do statistical tests on the data. There are some easy-to-follow YouTube tutorials for SeqMonk which I would highly recommend. Learning directly from bioinformaticians and being able to ask them questions has also been extremely helpful. I’ve been lucky to be able to learn a lot from the skilled bioinformaticians and wet lab scientists who have learned bioinformatics in my lab.
Understandably, bioinformatics can be daunting to wet lab biologists as it requires an understanding of programming and in-depth statistics which we don’t typically learn in biology degrees. However, just like extracting RNA or learning tissue culture in the lab, I’ve found that starting from the basics and learning it step by step hasn’t been as overwhelming as I expected.
The Biochemical Society is running online introductory courses for Python and R. Visit our website to find out more and register here.
About the author:
Natalia Benetti is a second-year PhD student at the Walter and Eliza Hall Institute of Medical Research and University of Melbourne in Melbourne, Australia. She is fascinated by how a single-celled zygote is able to grow into a complex adult organism, and her PhD project is about investigating epigenetic mechanisms of gene silencing during mouse embryonic development. In Natalia’s spare time she loves to play basketball and go hiking.