Input data and files#
khloraascaf
take as input data the ones that can be generated from a previous assembly from the raw reads.
They consist on a set of contigs, enriched with some specific attributes, and a set of links between two oriented contigs.
Contigs’ attributes#
For each contig, it is necessary to provide:
Its identifier (string)
Its multiplicity (integer)
A score that can be interpreted as the probability the contig belongs to the chloroplast genome and repeated at least one time in it
Note
The multiplicities and the scores can be obtained e.g. by mapping the raw reads and some specific chloroplast genes on the contigs. Be sure the data you generated can be interpreted like the above given description.
File format#
This attributes must be formatted in a tab separated values file. Each line corresponds to a contig and its attributes:
<contig_id>\t<mult>\t<score>\n
Example:
C0 1 50.72
C1 1 33.33
C2 2 15.60
C3 2 18.39
C4 1 28.67
Links between oriented contigs#
A link between two oriented contigs corresponds to the possibility to put the second oriented contig after the first one in the scaffolding.
For each link, it is necessary to provide:
Its identifier (string)
The identifier of the first contig (string)
The orientation of the first contig (symbol
+
or-
)The identifier of the second contig (string)
The orientation of the second contig (symbol
+
or-
)
Note
A link can correspond to a contig in one orientation that overlaps another in one orientation, or can represent the existence of a sequence that connects the two oriented contigs in the previous assembly data.
File format#
This attributes must be formatted in a tab separated values file. Each line corresponds to a contig and its attributes:
<link_id>\t<contig1_id>\t<contig1_or>\t<contig2_id>\t<contig2_or>\n
Example:
L01 C0 + C1 -
L12 C2 + C1 +
L23 C2 - C3 +
L34 C3 + C4 +
L43 C4 + C3 -
L20 C2 + C0 +
A starter contig#
The identifier of the contig that starts the scaffolding.
Example: C0