InDel paper and musings on alignments

After a very windy 5-year+ ordeal*, I’m happy to say that we have finally published the paper covering the most successful part of my PhD work! Four years since the original preprint is a very long time and it seems hard to understand how it could have taken this long, so I’ll rattle off a few numbers to provide some context into the personal sides of academic publishing that rarely get mentioned.

Read More

Identifying bird species by their calls with deep learning - Part 1, Dataset Construction

As promised in the last post, I’m getting back to talking about some data and the fun things one can do with it. While we already had some fun with the classifier I built for the many Chrises of Hollywood, that was just a little placeholder until I had time to prepare a more interesting* dataset that I’ve been planning for a while! Since there is a lot to talk about, I’ve decided to split this topic into at least two posts, starting today with dataset construction and the next one will be about the actual model training using the data.

Read More

The Road Ahead

Note: This post is going to be a bit more personal, so if you don’t care to read my ramblings on what I’m currently doing and what I’m planning for the next few months, just give this one a skip and I promise to have more data-related goodness for the next post. *

Read More

Image classification with FastAI

Last weekend I started doing the Practical Deep Learning for Coders online course from the creator of the fastai python library, Jeremy Howard. I’d been wanting to start it for a few weeks but I saw a tweet by Jeremy saying that new versions of the library and the course would be released, and now that they are out I think it was worth the wait! I had seen the first lecture of the previous version of the course and even started the fun process of setting up a google compute account to run the code from the course using their free credits on signup, as the course recommended until then. But now for this version, they simplified and just recommend using the free options in Google Colab or Paperspace, which has been super easy to set up!

Read More

Quick exploration of a long-term insect collection dataset from Copenhagen

Earlier this week, I was scrolling through Kaggle to see what types of biology datasets were available there and ran across this one that seemed interesting: Insect Light Trap, a huge 18-year insect collection effort done by the Zoological Museum of the University of Copenhagen, between 1992 and 2009. The work was published back in 2015 and is open-access, with very detailed analyses on the diversity of groups found, correlating observations to time of year (big peak in summer, very few observations in winter, as I’d expect for northern Europe) and temperature/climate change. Do check out the paper if you’d like a deeper look into those aspects, because I’m just going to have a quick look at the dataset itself and see what kind of fun stuff I find in there.

Read More