| Purpose: This document covers how the flow of data passes through the different scripts that our lab uses. |
Process:
|
Script Parameters |
CONVERT_AXT.PL |
| This script processes an AXT file that has a 2 species alignment. It will
use 2 parameters, species and version, and add that information to the axt
file creating an axte file. The format of the axt and axte are very similar,
see the axte file format for details. No
standard parameters exist for this script, since the change for each run.
View Script |
SOOPER_XML.PL |
| This script will accept several file formats (axte, pipmaker, blast and fasta).
The script will break the sequences in to non-gapped, non-repeative alignments.
From each alignment we will generate a 36 basepair probe, when possible.
View Script Standard Usage : sooper_xml.pl -r -i.88 -a<outputfile> <inputfile> |
CREATE_MEGABLAST_FILE.PL |
| This script will take a probe xml file, and reformat it to work with megablast. Also,
because of the processing time required for megabast, it is recommended that the probe
count for each file be 3000. View Script Standard Usage : create_megablast_file.pl -count 3000 -file <inputfile> -output <outputfile> |
MEGABLAST |
| This program will take a file of sequences and return a list of scores for match sequences
in the genome. Because of the computational time needed to run, this program is generally
run from a script, so that the load can be spread over all processors. Standard Usage : megablast -t16 -N2 -W11 -e0.6 -i <infile> -o <outfile> -d <database> -FF -D3 |
VALIDATE_MEGABLAST_OUTPUT.PL |
| This script takes the megablast output and the create_megablase_file output to determine
if a probe is unique and located where we expect. If a sequence has a score of 70+ and
the location matches, our probe is located where we expect. If a sequence with a score
of more than 40+ or five sequences of 30+, then the probe sequence is not considered unique.
This script will create an output file that contains the result and the header for each probe.
View Script USE VIA PROCESS_MEGABLAST_OUTPUT.SH Standard Usage : validate_megablast_output.pl -file <input_file> > <output_file> |
ZZZOOM_PROBES.PL |
| This script will take several files and seperate the valid (unique) from invalid (non-unique) probes.
It will also, given a -mask flag, mask out the sequence in the axte file, so that it won't be selected
again. The script will also generate a mini-axte which will contain only the alignments that held
invalid probes. This is done so that as the number of invalid probes decreases, the script won't
keep having to generate probes from previously successful processing. This script will automatically
scan the current directory for all the chr??.probes.xml.??.(in)valid files and will automatically
process them. View Script Standard Usage : separate_probes.pl -mask -c <chr num> -b <batch size> (Batch = 3000) |