New New Phytologist paper: Automated and accurate segmentation of leaf venation networks via deep learning

We have a new paper out in New Phytologist that develops methods for automatically segmenting and extracting networks from images of leaf veins. You can read the paper at the journal and download a copy of the software (LeafVeinCNN) at Zenodo.

These networks are useful for a range of ecophysiology and evolution questions, but have been hard to generate in the past due to the hard work needed to laboriously hand-trace imagery.

The project has been several years in the works, and came out of a lucky lunchtime conversation about machine learning back in 2015. The project was led by Hao Xu and Mark Fricker, both at Oxford – and it is wonderful to finally be able to share it.

The problem is not an easy one – images of leaf veins often have uneven contrast, or contrast from unwanted sources (like bubbles or tears associated with the microscopy process). Simple image morphology operations and thresholds are insufficient to do a good job in most cases – yet a human can easily solve the problem.

We decided to train an algorithm to learn how to trace images as a human would, using ground-truth data from humans. In 2020, this sounds quite routine, but in 2015, it was (at least to me) a big leap. So I used some fellowship funds to pay for people to hand-trace a few hundred million image pixels for a few hundred species, and then we went to work on training an algorithm to do the same thing. It took several tries to get right, but Hao found a good solution. Then Mark stepped in to develop additional network extraction code to take the segmented images and convert them to spatial graphs, where each vein segment is represented as a tube of a given length and radius.

The algorithm works quite well on a wide range of imagery, and should save many people from the tedium and low scalability of hand-tracing imagery. We are hoping that these algorithms will open up research questions that have previously been inaccessible.

The algorithm has been validated on relatively small leaf segments, rather than on whole leaves, and on segments of relatively good quality, rather than on the lower-quality older samples that are often available in museum collections. In some of these harder cases, our tests show that we are often are sometimes getting good performances, but in others, we are getting failures. As a result we are now (with NSF funding) developing additional machine learning models to more robustly automatically trace lower-quality and larger images. The new algorithms (not quite yet published, but get in touch if you want early access) are doing great and we look forward to sharing them soon.

Critically, all of these steps are wrapped up in a simple program (LeafVeinCNN) with a easy-to-use GUI – so no need to delve into the command line, or learn the details of TensorFlow, Keras, Python, and so on.

Thanks to NERC and NSF for making this work possible!