The latest version of XAM is made available to install through BioJulia's package registry. By default, Julia's package manager only includes the "General" package registry.
To add the BioJulia registry from the Julia REPL, press ] to enter pkg mode, then enter the following command:
We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development the core contributors and the person who filed the expense will be reimbursed.
Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project. Your logo will show up here with a link to your website.
The latest version of XAM is made available to install through BioJulia's package registry. By default, Julia's package manager only includes the "General" package registry.
To add the BioJulia registry from the Julia REPL, press ] to enter pkg mode, then enter the following command:
We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development the core contributors and the person who filed the expense will be reimbursed.
Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project. Your logo will show up here with a link to your website.
Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Return the number of operations in the CIGAR string of record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the number of operations in the true cigar string, because this is probably what you want, the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to get the number of operations in the cigar field of the BAM record, then set checkCG to false.
Create a SAM record from data. This function verifies the format and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Get a run-length encoded tuple (ops, lens) of the CIGAR string in record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
Return the number of operations in the CIGAR string of record.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the number of operations in the true cigar string, because this is probably what you want, the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to get the number of operations in the cigar field of the BAM record, then set checkCG to false.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
SAM and BAM records support the storing of optional data fields associated with tags.
Tagged auxiliary data follows a format of TAG:TYPE:VALUE. TAG is a two-letter string, and each tag can only appear once per record. TYPE is a single case-sensetive letter which defined the format of VALUE.
There are some tags that are reserved, predefined standard tags, for specific uses.
To access optional fields stored in tags, you use getindex indexing syntax on the record object. Note that accessing optional tag fields will result in type instability in Julia. This is because the type of the optional data is not known until run-time, as the tag is being read. This can have a significant impact on performance. To limit this, if the user knows the type of a value in advance, specifying it as a type annotation will alleviate the problem:
Below is an example of looping over records in a bam file and using indexing syntax to get the data stored in the "NM" tag. Note the UInt8 type assertion to alleviate type instability.
for record in open(BAM.Reader, "data.bam")
+
In the above we can see there were 7 sequences in the reference: 5 chromosomes, one chloroplast sequence, and one mitochondrial sequence.
Note that in the BAM specification, the field called cigar typically stores the cigar string of the record. However, this is not always true, sometimes the true cigar is very long, and due to some constraints of the BAM format, the actual cigar string is stored in an extra tag: CG:B,I, and the cigar field stores a pseudo-cigar string.
Calling this method with checkCG set to true (default) this method will always yield the true cigar string, because this is probably what you want the vast majority of the time.
If you have a record that stores the true cigar in a CG:B,I tag, but you still want to access the pseudo-cigar that is stored in the cigar field of the BAM record, then you can set checkCG to false.
SAM and BAM records support the storing of optional data fields associated with tags.
Tagged auxiliary data follows a format of TAG:TYPE:VALUE. TAG is a two-letter string, and each tag can only appear once per record. TYPE is a single case-sensetive letter which defined the format of VALUE.
There are some tags that are reserved, predefined standard tags, for specific uses.
To access optional fields stored in tags, you use getindex indexing syntax on the record object. Note that accessing optional tag fields will result in type instability in Julia. This is because the type of the optional data is not known until run-time, as the tag is being read. This can have a significant impact on performance. To limit this, if the user knows the type of a value in advance, specifying it as a type annotation will alleviate the problem:
Below is an example of looping over records in a bam file and using indexing syntax to get the data stored in the "NM" tag. Note the UInt8 type assertion to alleviate type instability.
for record in open(BAM.Reader, "data.bam")
nm = record["NM"]::UInt8
# do something
end
The XAM package supports the BAI index to fetch records in a specific range from a BAM file. Samtools provides index subcommand to create an index file (.bai) from a sorted BAM file.
$ samtools index -b SRR1238088.sort.bam
@@ -61,8 +61,7 @@ for record in eachoverlap(reader, "Chr2", 10000:11000)
# `record` is a BAM.Record object
# ...
end
-close(reader)
The eachoverlap method also accepts the Interval type defined in GenomicFeatures.jl.
This allows you to do things like first read in the genomic features from a GFF3 file, and then for each feature, iterate over all the BAM records that overlap with that feature.
The eachoverlap method also accepts the Interval type defined in GenomicFeatures.jl.
This allows you to do things like first read in the genomic features from a GFF3 file, and then for each feature, iterate over all the BAM records that overlap with that feature.
using GenomicFeatures
using GFF3
using XAM
@@ -104,4 +103,4 @@ SAM.Writer(IOStream(<file my-data.sam>))
julia> bamw = BAM.Writer(BGZFStream(open("my-data.bam", "w"), "w"))
BAM.Writer(BGZFStreams.BGZFStream{IOStream}(<mode=write>))
Once you have a BAM or SAM writer, you can use the write method to write BAM.Records or SAM.Records to file:
julia> write(bamw, rec) # Here rec is a `BAM.Record`
-330780
Settings
This document was generated with Documenter.jl on Friday 28 February 2020. Using Julia version 1.3.0.