explainshell.com - samtools(1) - Utilities for the Sequence Alignment/Map (SAM) format

-b      Output in the BAM format.

-f INT  Only output alignments with all bits in INT present in the FLAG field. INT  can  be  in
        hex in the format of /^0x[0-9A-F]+/ [0]

-F INT  Skip alignments with bits present in INT [0]

-h      Include the header in the output.

-H      Output the header only.

-l STR  Only output reads in library STR [null]

-o FILE Output file [stdout]

-q INT  Skip alignments with MAPQ smaller than INT [0]

-r STR  Only output reads in read group STR [null]

-R FILE Output reads in read groups listed in FILE [null]

-S      Input is in SAM. If @SQ header lines are absent, the `-t' option is required.

-c      Instead  of  printing  the  alignments, only count them and print the total number. All
        filter options, such as `-f', `-F' and `-q' , are taken into account.

-t FILE This file is TAB-delimited. Each line must contain the reference name and the length of
        the  reference,  one  line  for each distinct reference; additional fields are ignored.
        This file also defines the order of the reference sequences  in  sorting.  If  you  run
        `samtools  faidx  <ref.fa>',  the resultant index file <ref.fa>.fai can be used as this
        <in.ref_list> file.

-u      Output uncompressed BAM. This option saves time spent on  compression/decomprssion  and
        is thus preferred when the output is piped to another samtools command.

-6        Assume the quality is in the Illumina 1.3+ encoding.  -A Do not skip  anomalous  read
          pairs in variant calling.

-B        Disable  probabilistic  realignment  for  the  computation  of base alignment quality
          (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned.  Applying
          this option greatly helps to reduce false SNPs caused by misalignments.

-b FILE   List of input BAM files, one file per line [null]

-C INT    Coefficient   for   downgrading   mapping  quality  for  reads  containing  excessive
          mismatches. Given a read with a phred-scaled probability q of  being  generated  from
          the  mapped  position, the new mapping quality is about sqrt((INT-q)/INT)*INT. A zero
          value disables this functionality; if enabled, the recommended value for BWA  is  50.
          [0]

-d INT    At a position, read maximally INT reads per input BAM. [250]

-E        Extended  BAQ computation. This option helps sensitivity especially for MNPs, but may
          hurt specificity a little bit.

-f FILE   The faidx-indexed reference file in the FASTA format.  The  file  can  be  optionally
          compressed by razip.  [null]

-l FILE   BED  or  position list file containing a list of regions or sites where pileup or BCF
          should be generated [null]

-q INT    Minimum mapping quality for an alignment to be used [0]

-Q INT    Minimum base quality for a base to be considered [13]

-r STR    Only generate pileup in region STR [all sites]

-D        Output per-sample read depth

-g        Compute genotype likelihoods and output them in the binary call format (BCF).

-S        Output per-sample Phred-scaled strand bias P-value

-u        Similar to -g except that the output is uncompressed  BCF,  which  is  preferred  for
          piping.

-e INT    Phred-scaled gap extension sequencing error probability. Reducing INT leads to longer
          indels. [20]

-h INT    Coefficient for modeling homopolymer errors. Given an  l-long  homopolymer  run,  the
          sequencing error of an indel of size s is modeled as INT*s/l.  [100]

-I        Do not perform INDEL calling

-L INT    Skip INDEL calling if the average per-sample depth is above INT.  [250]

-o INT    Phred-scaled  gap open sequencing error probability. Reducing INT leads to more indel
          calls. [40]

-P STR    Comma dilimited list of platforms (determined by @RG-PL) from which indel  candidates
          are  obtained.  It  is  recommended  to  collect  indel  candidates  from  sequencing
          technologies that have low indel error rate such as ILLUMINA. [all]

-o      Output the final alignment to the standard output.

-n      Sort by read names rather than by chromosomal coordinates

          -m INT  Approximately the maximum required memory. [500000000]

merge     samtools merge [-nur1f] [-h inh.sam] [-R reg] <out.bam> <in1.bam> <in2.bam> [...]

          Merge  multiple  sorted alignments.  The header reference lists of all the input BAM files, and
          the @SQ headers of inh.sam, if any, must all refer to the same set of reference sequences.  The
          header  reference  list  and (unless overridden by -h) `@' headers of in1.bam will be copied to
          out.bam, and the headers of other files will be ignored.

-1      Use zlib compression level 1 to comrpess the output

-f      Force to overwrite the output file if present.

-h FILE Use the lines of FILE as `@' headers to be copied  to  out.bam,  replacing  any  header
        lines  that  would  otherwise be copied from in1.bam.  (FILE is actually in SAM format,
        though any alignment records it may contain are ignored.)

-n      The input alignments are sorted by read names rather than by chromosomal coordinates

-R STR  Merge files in the specified region indicated by STR [null]

-r      Attach an RG tag to each alignment. The tag value is inferred from file names.

-u      Uncompressed BAM output

-s      Remove  duplicate  for  single-end  reads. By default, the command works for paired-end
        reads only.

          -S      Treat paired-end reads and single-end reads.

calmd     samtools calmd [-EeubSr] [-C capQcoef] <aln.bam> <ref.fasta>

          Generate the MD tag. If the MD tag is already present, this command will give a warning if  the
          MD tag generated is different from the existing tag. Output SAM by default.

-A      When used jointly with -r this option overwrites the original base quality.

-e      Convert  a  the  read base to = if it is identical to the aligned reference base. Indel
        caller does not support the = bases at the moment.

-u      Output uncompressed BAM

-b      Output compressed BAM

-S      The input is SAM with header lines

-C INT  Coefficient to cap mapping quality of poorly mapped reads. See the pileup  command  for
        details. [0]

-r      Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

          -E      Extended  BAQ  calculation.  This option trades specificity for sensitivity, though the
                  effect is minor.

targetcut samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>

          This command identifies target regions by examining the  continuity  of  read  depth,  computes
          haploid  consensus sequences of targets and outputs a SAM with each sequence corresponding to a
          target. When option -f is in use, BAQ will be  applied.  This  command  is  only  designed  for
          cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].

phase     samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam>

          Call and phase heterozygous SNPs.  OPTIONS:

-A      Drop reads with ambiguous phase.

-b STR  Prefix  of  BAM output. When this option is in use, phase-0 reads will be saved in file
        STR.0.bam and phase-1 reads  in  STR.1.bam.   Phase  unknown  reads  will  be  randomly
        allocated  to  one of the two files. Chimeric reads with switch errors will be saved in
        STR.chimeric.bam.  [null]

-F      Do not attempt to fix chimeric reads.

-k INT  Maximum length for local phasing. [13]

-q INT  Minimum Phred-scaled LOD to call a heterozygote. [40]

-Q INT  Minimum base quality to be used in het calling. [13]

-A        Retain all possible alternate alleles at variant sites. By default, the view  command
          discards unlikely alleles.

-b        Output in the BCF format. The default is VCF.

-D FILE   Sequence dictionary (list of chromosome names) for VCF->BCF conversion [null]

-F        Indicate PL is generated by r921 or before (ordering is different).

-G        Suppress all individual genotype information.

-l FILE   List of sites at which information are outputted [all sites]

-N        Skip sites where the REF field is not A/C/G/T

-Q        Output the QCALL likelihood format

-s FILE   List  of samples to use. The first column in the input gives the sample names and the
          second gives the ploidy, which can only be 1 or 2. When the 2nd column is absent, the
          sample  ploidy  is  assumed  to  be 2. In the output, the ordering of samples will be
          identical to the one in FILE.  [null]

-S        The input is VCF instead of BCF.

-u        Uncompressed BCF output (force -b).

-c        Call variants using Bayesian inference. This option automatically invokes option -e.

-d FLOAT  When -v is in use, skip loci where the fraction of samples covered by reads is  below
          FLOAT. [0]

-e        Perform   max-likelihood   inference  only,  including  estimating  the  site  allele
          frequency, testing Hardy-Weinberg equlibrium and testing associations with LRT.

-g        Call per-sample genotypes at variant sites (force -c)

-i FLOAT  Ratio of INDEL-to-SNP mutation rate [0.15]

-p FLOAT  A site is considered to be a variant if P(ref|D)<FLOAT [0.5]

-P STR    Prior or initial allele frequency spectrum. If STR can be full, cond2,  flat  or  the
          file consisting of error output from a previous variant calling run.

-t FLOAT  Scaled muttion rate for variant calling [0.001]

-T STR    Enable pair/trio calling. For trio calling, option -s is usually needed to be applied
          to configure the trio members and their ordering.  In the file supplied to the option
          -s,  the  first  sample  must  be  the child, the second the father and the third the
          mother.  The valid values of STR are `pair', `trioauto', `trioxd' and `trioxs', where
          `pair' calls differences between two input samples, and `trioxd' (`trioxs') specifies
          that the input is from the X chromosome non-PAR regions and the  child  is  a  female
          (male). [null]

-v        Output variant sites only (force -c)

-1 INT    Number  of  group-1  samples.  This  option is used for dividing the samples into two
          groups for contrast SNP calling or association test.  When this option is in use, the
          following VCF INFO will be outputted: PC2, PCHI2 and QCHI2. [0]

-U INT    Number of permutations for association test (effective only with -1) [0]

-X FLOAT  Only perform permutations for P(chi^2)<FLOAT (effective only with -U) [0.01]