The latest version of XAM is made available to install through BioJulia's package registry. By default, Julia's package manager only includes the "General" package registry.
To add the BioJulia registry from the Julia REPL, press ] to enter pkg mode, then enter the following command:
We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development the core contributors and the person who filed the expense will be reimbursed.
Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project. Your logo will show up here with a link to your website.
The latest version of XAM is made available to install through BioJulia's package registry. By default, Julia's package manager only includes the "General" package registry.
To add the BioJulia registry from the Julia REPL, press ] to enter pkg mode, then enter the following command:
We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development the core contributors and the person who filed the expense will be reimbursed.
Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project. Your logo will show up here with a link to your website.
Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Return the number of operations in the CIGAR string of record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the number of operations in the true cigar string, because this is probably what you want, the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to get the number of operations in the cigar field of the BAM record, then set checkCG to false.
Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Return the number of operations in the CIGAR string of record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the number of operations in the true cigar string, because this is probably what you want, the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to get the number of operations in the cigar field of the BAM record, then set checkCG to false.
This document was generated with Documenter.jl on Friday 17 April 2020. Using Julia version 1.3.0.
diff --git a/dev/man/hts-files/index.html b/dev/man/hts-files/index.html
index 595db43..cc11710 100644
--- a/dev/man/hts-files/index.html
+++ b/dev/man/hts-files/index.html
@@ -24,6 +24,7 @@ end
close(reader)
The size of a BAM file is often extremely large. The iterator interface demonstrated above allocates an object for each record and that may be a bottleneck of reading data from a BAM file. In-place reading reuses a pre-allocated object for every record and less memory allocation happens in reading:
reader = open(BAM.Reader, "data.bam")
record = BAM.Record()
while !eof(reader)
+ empty!(record)
read!(reader, record)
# do something
end
Both SAM.Reader and BAM.Reader implement the header function, which returns a SAM.Header object. To extract certain information out of the headers, you can use the find method on the header to extract information according to SAM/BAM tag. Again we refer you to the specification for full details of all the different tags that can occur in headers, and what they mean.
Below is an example of extracting all the info about the reference sequences from the BAM header. In SAM/BAM, any description of a reference sequence is stored in the header, under a tag denoted SQ (think reference SeQuence!).
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
SAM and BAM records support the storing of optional data fields associated with tags.
Tagged auxiliary data follows a format of TAG:TYPE:VALUE. TAG is a two-letter string, and each tag can only appear once per record. TYPE is a single case-sensetive letter which defined the format of VALUE.
There are some tags that are reserved, predefined standard tags, for specific uses.
To access optional fields stored in tags, you use getindex indexing syntax on the record object. Note that accessing optional tag fields will result in type instability in Julia. This is because the type of the optional data is not known until run-time, as the tag is being read. This can have a significant impact on performance. To limit this, if the user knows the type of a value in advance, specifying it as a type annotation will alleviate the problem:
Below is an example of looping over records in a bam file and using indexing syntax to get the data stored in the "NM" tag. Note the UInt8 type assertion to alleviate type instability.
for record in open(BAM.Reader, "data.bam")
+
In the above we can see there were 7 sequences in the reference: 5 chromosomes, one chloroplast sequence, and one mitochondrial sequence.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
SAM and BAM records support the storing of optional data fields associated with tags.
Tagged auxiliary data follows a format of TAG:TYPE:VALUE. TAG is a two-letter string, and each tag can only appear once per record. TYPE is a single case-sensetive letter which defined the format of VALUE.
There are some tags that are reserved, predefined standard tags, for specific uses.
To access optional fields stored in tags, you use getindex indexing syntax on the record object. Note that accessing optional tag fields will result in type instability in Julia. This is because the type of the optional data is not known until run-time, as the tag is being read. This can have a significant impact on performance. To limit this, if the user knows the type of a value in advance, specifying it as a type annotation will alleviate the problem:
Below is an example of looping over records in a bam file and using indexing syntax to get the data stored in the "NM" tag. Note the UInt8 type assertion to alleviate type instability.
for record in open(BAM.Reader, "data.bam")
nm = record["NM"]::UInt8
# do something
end
The XAM package supports the BAI index to fetch records in a specific range from a BAM file. Samtools provides index subcommand to create an index file (.bai) from a sorted BAM file.