hictk: blazing fast toolkit to work with .hic and .cool files

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

hictk: blazing fast toolkit to work with .hic and .cool files

Authors

Rossini, R.; Paulsen, J.

Abstract

Motivation: Hi-C is gaining prominence as a method for mapping genome organization. With declining sequencing costs and a growing demand for higher-resolution data, this trend is expected to persist. Therefore, efficient tools for processing Hi-C datasets at different resolutions are crucial. Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format. Yet, interoperability between the two formats is lacking, making it unnecessarily difficult to convert between the two formats, or writing applications that can natively process both formats. Therefore, there is a pressing need for high-performance tools that can efficiently handle both file formats. Results: We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication. Availability and Implementation: The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk. Pre-built binaries for Linux and macOS are available on bioconda. Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy.

Follow Us on

0 comments

Add comment