Modules
OakVar's functionalities are mostly performed by Python modules. OakVar orchestrates their execution as well as their management.
To understand OakVar's modules, let's first see which modules are already installed in the system.
ov module ls
This will show a list of modules installed in the system. These modules are stored under OakVar modules directory, which can be found with
ov system md
Inside the modules directory, there are subdirectories such as the following.
annotators
commons
converters
mappers
postaggregators
reporters
webapps
webviewerwidgets
Each subdirectory represents a type of OakVar module. Inside each type subdirectory, another level of subdirectories exist for the modules of the type directory. For example, in reporter module type directory, the following subdirectories may exist, which correspond to the reporter modules in the system.
csvreporter
excelreporter
stdoutreporter
textreporter
tsvreporter
vcfreporter
Anatomy of a module
Details of a specific module can be shown with ov module info. Let's take a look at the information of stdoutreporter module.
ov module info stdoutreporter
In the output, the directory where the module is installed can be found in location field. In the directory of stdoutreporter, the following three files will exist.
stdoutreporter.md
stdoutreporter.py
stdoutreporter.yml
Among these files, .py and .yml files are the two essential files of any OakVar module. .py file is a Python module file which handles the operation of the module. .yml file is a YAML format file which has the information and the configuration of the module. .md file is a markdown format file which will be displayed on the OakVar web store.
Reporter
Let's take a look at each of these files. Below is the essential parts of stdoutreporter.yml.
title: Standard Output Reporter
version: 1.2.0
type: reporter
OakVar uses this information to manage modules.
Below is stdoutreporter.py.
from oakvar import BaseReport
class Reporter(BaseReport):
def setup (self):
if self.args:
self.levels_to_write = self.args.get("level")
if not self.levels_to_write:
self.levels_to_write = ['variant']
def write_preface (self, level):
self.level = level
def write_header (self, level):
line = '#'+'\t'.join([col['col_name'] for \
col in self.extracted_cols[level]])
print(line)
def write_table_row (self, row):
print('\t'.join([str(v) if v != None else '' \
for v in list(row)]))
stdoutreporter.py does not have all the codes to filter and fetch annotated variants from an OakVar annotation database file. OakVar connects to stdoutreporter.py and calls functions setup, write_preface, write_header, and write_table_row. By defining these functions, different reporter modules can be made. More on these functions are explained in Workflow section.
Annotator
Check out a guide on interactive annotation module development here.
Watch a webinar on making OakVar annotation modules here.
The essential function for annotator modules is annotator. Below is target.py of target annotation module.
from oakvar import BaseAnnotator
import sqlite3
class Annotator(BaseAnnotator):
def annotate(self, input_data):
self.cursor.execute('select rationale, agents_therapy ' +
'from target where gene="%s";'%input_data['hugo'])
row = self.cursor.fetchone()
if row:
out = {'rationale': row[0], 'therapy': row[1]}
else:
out = None
return out
annotate function receives input_data which is a dict of a variant, such as
{"uid": 1834,
"chrom": "chr1",
"pos": 19834895,
"ref_base": "A",
"alt_base": "G"}
If you add input_format: crx to your module's yml file, input_data will have additional information pulled from .crx file, such as
{"uid": 1834,
"chrom": "chr1",
"pos": 19834895,
"ref_base": "A",
"alt_base": "G",
"hugo": "GeneA",
"transcript": "ENST0000038472",
"so": "missense_variant",
...}
OakVar will feed into annotate of an annotator module with variants of the input file, one by one, and excepts a dict of the module's output for each given variant. In the above example, the output is a dict with two keys, rationale and therapy. OakVar will collect the dicts of input variants and feed them to the downstream steps.
The two keys rationale and therapy in the above example are the output columns of the module. The output columns of a module should be defined in the module's config (.yml) file. target module's config file, target.yml, has the following output columns definition.
output_columns:
- name: therapy
title: Recommended Therapy
type: string
- name: rationale
title: Rationale
type: string
name, title, and type are essential components of an output column, with name being the same as a key in the output dict by annotate function. OakVar will expect the keys defined as name in output_columns in a module's config file in the return value of annotate function of the module.
Once a module's output columns are defined in its config file and the module's annotate function returns a dict with those output columns' name as its keys, OakVar will do the rest to include the module's output in the annotation database and reports.
OakVar provides convenience variables to eahc module's annotate function. If a module has data subdirectory and if the subdirectory has an SQLite database file whose name is <module name>.sqlite (thus, in the above example, target/data/target.sqlite), self.conn and self.cursor are provided as an SQLite database connection and cursor objects.
Mapper
The essential function for mapper modules is map. A typical mapper modules will be structured as follows.
...
class Mapper(BaseMapper):
...
def map(self, input_data):
...
A mapper module's map function is similar to an annotator module's annotate function, in that it receives a dict of an input variant, which has keys such as uid, chrom, pos, ref_base, and alt_base) and is expected to return andictof its output. One difference is that its outputdictis supposed to have a pre-defined set of keys. First of all, the outputdictofmapfunction should have the keys ininput_data`. Then, the following keys should be defined as well.
coding
hugo
transcript
so
cchange
achange
all_mappings
More details will be explained here. Until then, you can take a look at gencode module's gencode.py to know more.
Converter
The essential function for converter modules is convert_line. A typical converter module will be structured as follows.
...
class Converter(BaseConverter):
...
def convert_line(self, l):
...
More will be explained later.
Postaggregator
The essential function for postaggregator modules is annotate. A typical postaggregator module will be structured as follows.
...
class Postaggregator(BasePostaggregator):
...
def annotate(self, input_data):
...
More will be explained later.
Dependency control
Module dependency
An OakVar module can depend on other OakVar modules for it to properly function. Let's say module annotator1 uses the output of annotator2 and annotator3 as its input. For annotator1 to properly function, annotator2 and annotator3 should be already installed. This installation requirement is specified in the config file of annotator1 (annotator1.yml) as the following:
requires:
- annotator2
- annotator3
With this in place, when annotator1 is installed with ov module install annotator1, the two dependency modules also will be installed automatically, if not already present in the system.
As mentioned, annotator1 uses the output of annotator2 and annotator3 as its input. This dependency should be defined in annotator1.yml as the following.
secondary_inputs:
annotator2: {}
annotator3: {}
With this simple definition, the output of annotate2 and annotate3 will be available as secondary_data variable to the annotate function of annotate1 module. For example,
def annotate(self, input_data, secondary_data=None):
...
of annotate1.py will have secondary_data["annotate2"] and secondary_data["annotate3"] available. If annotate2.yml has the following output column definition,
output_columns:
- name: value1
title: Value 1
type: string
- name: value2
title: Value 2
type: string
annotate1's annotate function will be able to access those with secondary_data["annotator2"]["value1"] and secondary_data["annotator2"]["value2"]".
Finer control of secondary input is possible as follows. For example, the following in annotate1.yml
secondary_inputs:
annotator2:
match_columns:
primary: uid
secondary: uid
use_columns:
- value1
will mean that annotator2 output will be available as secondary_data["annotator2"] to the annotate function of annotator1, that for each variant, the uid field in the output by annotator2 and the uid field in input_data to the annotate function of annotator1 will be match to find the correct secondary_data for the variant to the function, and that only value1 field will be available to the function.
PyPI dependency
If an OakVar module needs packages from PyPI, such requirement can be specified in the module's yml file. For example, if annotator1's annotator1.yml has the following,
pypi_dependencies:
- numpy
ov module install annotator1 will automatically perform pip install numpy while installing annotate1 module.
Template generation
ov new module command will generate a module template folder, with template .py, .yml. and .md files for the module. The template will be in the correct folder for the module and recognized by OakVar automatically. Its usage is
ov new module -n MODULE_NAME -t MODULE_TYPE
where MODULE_NAME is the name of the new module (numbers and small letters and up to one underscore between two letters) and MODULE_TYPE is the type of the new module (converter, mapper, annotator, postaggregator, and reporter).
Below are examples.
ov new module -n annotator_1 -t annotator
will generate the template for the annotator module annotator_1 in the folder <MODULES_DIR>/annotators/annotator_1.
ov new module -n converter_1 -t converter
will generate the template for the converter module converter_1 in the folder <MODULES_DIR>/converters/converter_1.
ov new module -n postaggregator_1 -t postaggregator
will generate the template for the postaggregator module postaggregator_1 in the folder <MODULES_DIR>/postaggregators/postaggregator_1.
Module options
Modules can receive custom options through CLI or GUI. In ov run, such options to modules can be given with --module-options option. For example,
ov run input.vcf --module-options annotator_1.option_1=value_1
will send the following Python dict to the annotator_1 module as self.module_options.
{ "option_1": "value_1" }
In annotator_1.py, self.module_options will have the above dict.
Module options should be defined in the module's yml file. Thus, annotator_1.yml should have the following.
module_options:
annotator_1:
option_1:
title: Option 1
type: string
help: Option 1 help
Under the module_options key, the name of the annotator module should come. Then, the name of the option key. Then, title, type, and help are mandatory keys. type can be string, int, or float. With these defined, when annotator_1 is selected on the OakVar GUI (launched by ov gui), an input box for option_1 will appear under the title and will have a tooltip with the text of help.