Reference overview¶
This page offers a brief overview over all classes and functions offered by HTSeq.
Parser and record classes¶
For all supported file formats, parser classes (called Reader
) are provided. These classes all
instatiated by giving a file name or an open file or stream and the function as iterator generators,
i.e., the parser objects can be used, e.g., in a for
loop to yield a sequence of objects, each
desribing one record. The table gives the parse class and the record class yielded. For details,
see the linked documentation
For most formats, functionality for writing files of the format is provided. See the detailed documentation as these methods and classes have varying semantics.
File format | typical content | Parser class for reading | Record class yielded by parser | Method/class method for writing |
---|---|---|---|---|
FASTA | DNA sequences | FastaReader |
Sequence |
Sequence.write_to_fasta_file() |
FASTQ | sequenced reads | FastqReader |
SequenceWithQualities |
SequenceWithQuality.write_to_fastq_file() |
GFF (incl. GFF3 and GTF) | genomic annotation | GFF_Reader |
GenomicFeature |
GenomicFeature.get_gff_line() |
BED | score data or annotation | BED_Reader |
GenomicFeature |
|
Wiggle (incl. BedGraph) | score data on a genome | WiggleReader |
pair: ( GenomicInterval , float) |
GenomicArray.write_bedgraph_file() |
SAM | aligned reads | SAM_Reader |
SAM_Alignment |
SAM_Alignment.get_sam_line() |
BAM | aligned reads | BAM_Reader |
SAM_Alignment |
BAM_Writer |
VCF | variant calls | VCF_Reader |
VariantCall |
|
Bowtie (legacy format) | aligned reads | BowtieReader |
BowtieAlignment |
|
SolexaExport (legacy format) | aligned reads | SolexaExportReader |
SolexaExportAlignment |
Most parser classes are sub-classes of class FileOrSequence
, which users will, however, rarely use directly.
Specifying genomic positions and intervals¶
The class GenomicInterval
specifies an interval on a chromosome (or contig or the like). It is defined by specifying the chromosome (or contig) name,
the start and the end and the strand. Convenience methods are offered for different ways of accessing this information, and for tetsing
for overlap between intervals. A GenomicPosition
, technically a GenomicInterval of length 1, refers to a single nucleotide or base-pair
position.
Objects of these classes are used internally wherever intervals or positions are represented, especially in record classes and as index keys to genomic array.
See page Genomic intervals and genomic arrays for details.
Genomic arrays¶
The classes GenomicArray
and GenomicArrayOfSets
are container classes to store data associated with genomic positions or intervals.
See page Genomic intervals and genomic arrays for details.
Special features for SAM/BAM files¶
The class CigarOperation
offers a convenient way to handle the information encoded in the CIGAR field of SAM files.
The functions pair_SAM_alignments()
and pair_SAM_alignments_with_buffer()
help to pair up
the records
in a SAM file that describe a pair of alignments for mated reads from the same DNA fragment.
Similarly, the function bundle_multiple_alignments()
bundles multiple alignment record pertaining to the same read or read pair.
See page Read alignments for details.