6mA motif identification and methylome characterization - tutorial

This workflow demonstrates how to use tools I created for the identification and analysis of RM target motifs from Oxford Nanopore sequencing reads. The workflow starts with the BED file generated by ONT's ModKit program, and is broken down into 3 modules:

1. Motif Discovery
Identifies target motifs of the RM system(s) harbored in your genome and calculates the percentage of sites within the motif genome wide that are modified.

2. Methylome Profiler
Maps all modified bases to your genome, counts modified bases within all coding and intergenic regions, counts across all genomic elements, sliding window methylation rates

3. Motif Analysis
Statistical testing on whether identified RM target motifs are enriched or avoided in different areas of your genome

To get started, clone the github repo to your local machine:

MODULE 1: MOTIF DISCOVERY

Get sequences surrounding all modified adenines of length k

Script: Akmaker.py

Inputs: BED file containing modified base call data and the reference genome that was used to map your reads

Output: fasta of all 6mA centered kmers

Use -len argument to specify the size of the kmer, as well as -depth for miniumum sequencing depth to consider (default >10) and -mod_thresh to set a frequency threshold for calling a base modified (default >50). The -modified option is used in this example. Use --help for more output options.

Find most commonly modfied short sequences containing your modified adenine

Script: motif_discovery.py

Inputs: BED file, reference genome, set of all 6mA centered kmers (Akmaker.py output)

Output: List of most abundant short sequences, total number of occurences, and modifification frequency genome wide

Specify the 0-indexed position of the 6mA with -mod_pos_index. The -target_lengths argument specifies the motif lengths to consider (separate different numbers with a comma). Use --help to customize other parameters such as miniumum sequencing depths and mod call thresholds.

Fine tune short sequences to a definitive R-M target motif

Script: mod_calc.py

Inputs: motif or list of motifs and the position of the modified adenine

Output: Total number of occurences, total number modified, and modifification frequency genome wide for each specified motif

Based on results from motif_discovery.py, look up modification frequencies inferred motif(s). Motifs can be entered with -motif with ambiguous nucleotide symbols. Use 1-based index for specifiying the position of the modified adenine in your motif with -mod_pos. Use --help to see options for custom parameters/thresholds. Run multiple times, substituting N's at different positions to hammer down exact specificity of the R-M target and determine a consensus motif.

MODULE 2: METHYLOME PROFILER

Generate a table of all modified base positions genome wide

Script: mod_mapper.py

Inputs: BED file, reference genome, modified motif, or file containing a list of multiple modified motifs and the position of the modified adenine

Output: table containing coordinates of all modified target and off-target sites, as well as target sites that are not modified

Use --help to customize other parameters such as miniumum sequencing depth to include and mod call thresholds.

View sites table

Link modified sites to features in the genome

Script: annotate_methylome.py

Inputs: .gbff file and modified sites table

Output:
3 tsv files named as below following user assigned prefix mod_annotations.tsv: table of modified base counts and normalized rates within all cds and intergenic regions of genome
annotated_sites.tsv: table of modified sites with column for the gene ID corresponding to the coordinate (or interegenic if not within cds)
summary.tsv: table of modified base counts and normalized rates for each chromosome/plasmid in the genome

View feature counts table

View site annotation table

View methylation summary table