Khloraa scaffolding: scaffolding method for chloroplast genomes#

Latest release PyPI version Coverage report Pylint score Mypy Pipeline status Documentation Status

khloraascaf is a Python3 package that implements a dedicated scaffolding method for chloroplast genomes.

From input data files, it computes combinations of Integer Linear Programming (ILP) programs and write the result of the best one in output files.

Quick installation#

To install the khloraascaf package from the PyPI repository, run the pip command :

pip install khloraascaf

You can find more installation details in the installation page.

Quick usage example#

from pathlib import Path

from khloraascaf import SOLVER_CBC, IOConfig, MetadataAllSolutions, scaffolding

# ---------------------------------------------------------------------------- #
# Run the example
# ---------------------------------------------------------------------------- #
#
# Prepare the scaffolding result directory
#
outdir = Path('scaffolding_result')
outdir.mkdir(exist_ok=True)
#
# Compute the scaffolding using the assembly data
#
outdir_gen = scaffolding(
    Path('tests/data/ir_sc/contig_attrs.tsv'),
    Path('tests/data/ir_sc/contig_links.tsv'),
    'C0',
    solver=SOLVER_CBC,
    outdir=outdir,
)
#
# khloraascaf creates a directory with a unique name
#   to put all the files it has created
#
assert outdir_gen in outdir.glob('*')
print(outdir_gen)

# ---------------------------------------------------------------------------- #
# Dive into the results
# ---------------------------------------------------------------------------- #
#
# Use metadata class to easily dive into the results
# (you can also see by hand the solutions.yaml file that has been produced)
#
all_solutions_metadata = MetadataAllSolutions.from_run_directory(outdir_gen)
#
# * How many solutions the scaffolding has produced?
#
print(len(all_solutions_metadata))
#   = 1, let pick its metadata
sol_metadata = tuple(all_solutions_metadata)[0]
#
# See which files the scaffolding has produced:
#
files = set(outdir_gen.glob('*'))
assert len(files) == 4
#
# * The list of oriented contigs for each region
#
assert sol_metadata.contigs_of_regions() in files
#
# * The list of oriented regions
#
assert sol_metadata.map_of_regions() in files
#
# * YAML file containing all the arguments and options you used
#   to run khloraascaf
#
assert outdir_gen / IOConfig.YAML_FILE in files
#
# * YAML file that contains metadata on the solutions
#
assert outdir_gen / MetadataAllSolutions.YAML_FILE in files

Changelog#

You can refer to the changelog page for details.

What next?#

Find a list of ideas in the to-do page.

Contributing#

  • If you find any errors, missing documentation or test, or you want to discuss features you would like to have, please post an issue (with the corresponding predefined template) here.

  • If you want to help me code, please post an issue or contact me. You can find coding convention in the contributing page.

References#

  • A part of the scaffolding method is described in this preprint:

    📰 Victor Epain, Dominique Lavenier, and Rumen Andonov, ‘Inverted Repeats Scaffolding for a Dedicated Chloroplast Genome Assembler’, 3 June 2022, https://doi.org/10.4230/LIPIcs.

Licence#

This work is licensed under a GNU-GPLv3 licence.