htseq-count-barcodes
: counting reads with cell barcodes and UMIs¶
This script is similar to htseq-count
, but is designed to operate on a single SAM/BAM/CRAM file that contains reads from many cells, distinguished by a cell barcode in the read name and possibly a unique molecular identifier (UMI).
To keep the documentation simple, this page does not repeat the explanations found for htseq-count
at htseq-count: counting reads within features and focuses on the differences instead.
- Unlike
htseq-count
, only one read file is accepted. - No multicore support is available ATM. Because barcoded, position-sorted BAM files are not trivially parallelizable, this feature is a little challenging to implement, however pull requests (PRs) on Github are welcome.
- The main target for this script are BAM files produced by 10X Genomics’
cellranger
pipeline. If you have a different application and would like to usehtseq-count-barcodes
, please open an issue on Github and we’ll be happy to consider adding it.