Adding new tracks

Adding new tracks to pyGenomeTracks only requires adding a new class to the pygenometracks/tracks folder. The name of the file must end with Track.py. The class must inherit the GenomeTrack (or other track class available) and must have a plot method. In order to work well with the config checker it should also have some global variable: - DEFAULTS_PROPERTIES is a dictionary where each key is a parameter and each value is the default value when it is not set or when something goes wrong. - NECESSARY_PROPERTIES is an array with all the parameters which are necessary for this track (usually ‘file’) - SYNONYMOUS_PROPERTIES is a dictionary where each key is a parameter, each value is a dictionnary where each key is a string that should be replaced by the value (for example, SYNONYMOUS_PROPERTIES = {'max_value': {'auto': None}}) - POSSIBLE_PROPERTIES is a dictionary where each key is a parameter, each value is an array with the only possible values for this parameter, if the value specified by the user is not part of them, it will be substituted by the default value. - BOOLEAN_PROPERTIES is an array with all parameters that should have a boolean value (a boolean value can be 0, 1, true, false, on, off) - STRING_PROPERTIES is an array with all parameters that have string values. It should always contains title and file_type. - FLOAT_PROPERTIES is a dictionnary where each key is a parameter, each value is an array with the min value (included) and the max value (included) that should have the parameter (You can use [- np.inf, np.inf] if there is no restriction). This dictionary should always contains 'height': [0, np.inf] - INTEGER_PROPERTIES same as FLOAT_PROPERTIES for integer values.

Additionally, some basic description should be added.

For example, to make a track that prints ‘hello world’ at a given location looks like this:

# -*- coding: utf-8 -*-
from . GenomeTrack import GenomeTrack
import numpy as np


class TextTrack(GenomeTrack):
    SUPPORTED_ENDINGS = ['.txt']  # this is used by make_tracks_file to guess the type of track based on file name
    TRACK_TYPE = 'text'
    OPTIONS_TXT = """
height = 3
title =
text =
# x position of text in the plot (in bp)
x position =
"""
    DEFAULTS_PROPERTIES = {'text': 'hello world'}
    NECESSARY_PROPERTIES = ['x_position']
    SYNONYMOUS_PROPERTIES = {}
    POSSIBLE_PROPERTIES = {}
    BOOLEAN_PROPERTIES = []
    STRING_PROPERTIES = ['text', 'title', 'file_type']
    FLOAT_PROPERTIES = {'height': [0, np.inf],
                        'x_position': [0, np.inf]}
    INTEGER_PROPERTIES = {}

    def plot(self, ax, chrom, region_start, region_end):
        """
        This example simply plots the given title at a fixed
        location in the axis. The chrom, region_start and region_end
        variables are not used.
        Args:
            ax: matplotlib axis to plot
            chrom_region: chromosome name
            start_region: start coordinate of genomic position
            end_region: end coordinate
        """
        # print text at position x = self.properties['x position'] and y = 0.5 (center of the plot)
        ax.text(self.properties['x_position'], 0.5, self.properties['text'])

The OPTIONS_TXT should contain the text to build a default configuration file. This information, together with the information about SUPPORTED_ENDINGS is used by the program make_tracks_file to create a default configuration file based on the endings of the files given.

The configuration file is:

[x-axis]
where = top

[new track]
file = 
height = 4
title = new pyGenomeTrack
file_type = text
text = hello world
x position = 3100000
$ pyGenomeTracks  --tracks new_track.ini --region X:3000000-3200000 -o new_track.png
../_images/new_track.png

Notice that the resulting track already includes a y-axis (to the left) and a label to the right. Those are the defaults that can be changed by adding a plot_y_axis and plot_label methods.

Another more complex example is the plotting of multiple bedgraph data as matrices. The output of HiCExplorer hicFindTADs produces a file whose data format is similar to a bedgraph but with more value columns. We call this a bedgraph matrix. The following track plot this bedgraph matrix:

# -*- coding: utf-8 -*-
import numpy as np
from . BedGraphTrack import BedGraphTrack
from . GenomeTrack import GenomeTrack


class BedGraphMatrixTrack(BedGraphTrack):
    # this track class extends a BedGraphTrack that is already part of
    # pyGenomeTracks. The advantage of extending this class is that
    # we can re-use the code for reading a bedgraph file
    SUPPORTED_ENDINGS = ['.bm', '.bm.gz']
    TRACK_TYPE = 'bedgraph_matrix'
    OPTIONS_TXT = GenomeTrack.OPTIONS_TXT + """
        # a bedgraph matrix file is like a bedgraph, except that per bin there
        # are more than one value (separated by tab). This file type is
        # produced by the HiCExplorer tool hicFindTads and contains
        # the TAD-separation score at different window sizes.
        # E.g.
        # chrX	18279	40131	0.399113	0.364118	0.320857	0.274307
        # chrX	40132	54262	0.479340	0.425471	0.366541	0.324736
        #min_value = 0.10
        #max_value = 0.70
        file_type = {}
        """.format(TRACK_TYPE)
    DEFAULTS_PROPERTIES = {'max_value': None,
                           'min_value': None,
                           'show_data_range': True,
                           'orientation': None}
    NECESSARY_PROPERTIES = ['file']
    SYNONYMOUS_PROPERTIES = {'max_value': {'auto': None},
                             'min_value': {'auto': None}}
    POSSIBLE_PROPERTIES = {'orientation': [None, 'inverted']}
    BOOLEAN_PROPERTIES = ['show_data_range']
    STRING_PROPERTIES = ['file', 'file_type', 'overlay_previous',
                         'orientation', 'title']
    FLOAT_PROPERTIES = {'max_value': [- np.inf, np.inf],
                        'min_value': [- np.inf, np.inf],
                        'height': [0, np.inf]}
    INTEGER_PROPERTIES = {}

    # In BedGraphTrack the method set_properties_defaults
    # has been adapted to a coverage track. Here we want to
    # go back to the initial method:
    def set_properties_defaults(self):
        GenomeTrack.set_properties_defaults(self)

    def plot(self, ax, chrom_region, start_region, end_region):
        """
        Args:
            ax: matplotlib axis to plot
            chrom_region: chromosome name
            start_region: start coordinate of genomic position
            end_region: end coordinate
        """
        start_pos = []
        matrix_rows = []

        # the BedGraphTrack already has methods to read files
        # in which the first three columns are chrom, start,end
        # here we used the interval_tree method inherited from the
        # BedGraphTrack class
        for region in sorted(self.interval_tree[chrom_region][start_region - 10000:end_region + 10000]):
            start_pos.append(region.begin)
            # the region.data contains all the values for a given region
            # In the following code, such list is converted to floats and
            # appended to a new list.
            values = list(map(float, region.data))
            matrix_rows.append(values)

        # using numpy, the list of values per line in the bedgraph file
        # is converted into a matrix whose columns contain
        # the bedgraph values for the same line (notice that
        # the matrix is transposed to achieve this)
        matrix = np.vstack(matrix_rows).T

        # using meshgrid we get x and y positions to plot the matrix at
        # corresponding positions given in the bedgraph file.
        x, y = np.meshgrid(start_pos, np.arange(matrix.shape[0]))

        # shading adds some smoothing to the pllot
        shading = 'gouraud'
        vmax = self.properties['max_value']
        vmin = self.properties['min_value']

        img = ax.pcolormesh(x, y, matrix, vmin=vmin, vmax=vmax, shading=shading)
        img.set_rasterized(True)

    def plot_y_axis(self, ax, plot_axis):
        """turn off y_axis plot"""
        pass

    def __del__(self):
        if self.tbx is not None:
            self.tbx.close()

Let’s create a track for this:

[bedgraph matrix]
file = tad_separation_score.bm.gz
file_type = bedgraph_matrix
title = bedgraph matrix
height = 5

[spacer]

[x-axis]
$ pyGenomeTracks  --tracks bedgraph_matrix.ini --region X:2000000-3500000 -o bedgraph_matrix.png
../_images/bedgraph_matrix.png

Although this image looks interesting another way to plot the data is a overlapping lines with the mean value highlighted. Using the bedgraph version of pyGenomeTracks the following image can be obtained:

../_images/bedgraph_matrix_lines.png