Evaluating Languages for Bioinformatics: Performance, Expressiveness and Energy
Abstract
One of the fastest growing concerns in the technology sector is the increased demand for power in the world's data centers. Global data center electricity use in 2021 was estimated as between 220 and 320 terawatt-hours (TWh), as much as 1.3% of global electricity demand. As the data center industry continues to expand, so too will power usage, and therefore the need for increased energy efficiency in software development.
This thesis introduces a methodology that evaluates a set of programming languages based on three key metrics: performance, expressiveness, and energy use, demonstrating a fair consideration of each language's strengths and weaknesses. The framework presented creates a collection of string-matching algorithms used on DNA sequences to demonstrate the capabilities of each language, and draw out their distinctiveness.
DNA sequencing was chosen due to its growing uses and applications as technology evolves and makes such sequencing faster and less expensive. This in turn has lead to a growing percentage of compute-time being spent on this field. Using the methodology presented here it will be shown that using a newer language, like Rust, has advantages that help it balance speed, ease of use, and power consumption when used for advanced scientific computing.
A key part of this work introduces a novel approximate-matching algorithm to aid in this evaluation process. This new algorithm differs from current algorithms in use, in its ability to hold the gap between nucleotides to a specific maximum while allowing other gaps to exist. It will offer an alternative technique to other current approximate-matching algorithms and hopes to offer researchers another tool to consider for sequence-matching problems.
The expectation is that this research will show how testing and evaluating via performance, expressiveness and energy use metrics allows for rating and ranking programming languages in a consistent and reproducible manner. This will enable developers to make educated choices when selecting a language for a project. The methods described here will be applicable to other languages as well, given similar data to work with. This research will benefit the programming field by providing methods and techniques that can be used in the language selection process, particularly when energy efficiency is as important as overall performance.
Collections
- OU - Theses [2188]