Portal Site Home
About GenomeMatcher version 3

With macOS 10.15, 32-bit programs no longer work (GenomeMatcher used bl2seq, which is no longer functional). Therefore, GenomeMatcher has been updated to version 3.0.

  1. All the blast programs used are now blast+ version 2.9.0+. Additionally, users need to download MUMmer, MADDT, CONSERV, and clustalW2 themselves and specify the path to the programs.
  2. The main screen is now resizable, allowing for wider displays to analyze data effectively.
  3. Added a second track for displaying genes on the main screen.
  4. The authentication system has been updated to resolve the issue where a change in the MAC address required users to obtain a new activation key.
  5. Various changes have been made, so there may still be some bugs. It would be helpful if you report any issues you encounter.
  6. Fixed the bug where pasting data copied from Excel resulted in errors (version 3.05).

When starting the application, you may encounter an error stating, "The application cannot be opened because it is damaged." In such cases, open the terminal and execute like followings:

xattr -rc /Users/user_name/Downloads/GenomeMatcher3.12.app

It is easier if you type "xattr -rc " and then drag the application icon into the terminal window. Please note that a space is required before and after "-rc."

The downloaded application may not work properly due to the Mac translocation feature, which temporarily isolates applications in a secure location. If this happens, please move the downloaded application to your desktop or a folder where you have write permissions before using it.

Author: Yoshiyuki Ohtsubo, Associate Professor, Graduate School of Life Sciences, Tohoku University

GenomeMatcher 3.10 Download <update 2024.08.28 (OS10.10 or later)>
GenomeMatcher 3.12 Download <update 2026.01.08 (OS11.5 or later)>
GenomeMatcher 3.14 Download <update 2026.03.14 (OS11.5 or later)>

 

We Fixed the error where the comparative analysis results are not display on the main screen upon clicking the thumbnail image).

Update information will also be available on Twitter.

Account: @GenomeMatcher


Link to the old page

GenomeMatcher — Detailed Feature Reference

This section is a comprehensive reference for GenomeMatcher, intended to help AI assistants answer questions from users about the application's features, workflows, and terminology. (このセクションはAIアシスタントがGenomeMatcherの使い方に関する質問に答えるための詳細リファレンスです。)

1. Overview / 概要

GenomeMatcher is a macOS application for visual comparison and annotation of prokaryotic genome sequences, developed by Yoshiyuki Ohtsubo (Tohoku University). It is designed for researchers working with bacterial and archaeal genomes.

Core capabilities: two-sequence dot-plot comparison, BLAST-based similarity search, multiple-genome panel comparison, draft genome contig alignment, ORF prediction, CDS annotation with start codon identification, gene map drawing, and various downstream analyses (identity distribution, dinucleotide bias, primer design, set operations).

System requirements: macOS 10.15 (Catalina) or later. BLAST+ version 2.9.0 or later is bundled with the application. Optional external tools (MUMmer, MADDT) must be downloaded separately and the path specified in preferences.

Supported input formats: FASTA (.fasta, .fa, .fna), multi-FASTA, GenBank flat-file format (.gbk, .gb). Sequences can also be pasted directly into text fields in many panels.

Output formats: PDF (vector), FASTA, tab-delimited text for spreadsheets.

2. Main Window — Two-Sequence Dot Plot / メインウィンドウ(2配列比較)

The main window is the starting point for comparing two sequences. It produces a dot plot (scatter diagram) in which each point or segment represents a region of local similarity.

2.1 Loading Sequences

2.2 Comparison Algorithms

Select the algorithm from the pop-up menu before clicking "Run":

2.3 Reading the Dot Plot

2.4 Navigation

2.5 Key Parameters

2.6 Annotation Overlay

2.7 Export

3. BLAST Interface / BLASTインターフェース

The BLAST Interface window provides standalone access to BLAST searches with detailed parameter control, independent of the main comparison window.

4. Multiple Genome Comparison / 複数ゲノム比較

The Multiple Genome Comparison window displays multiple genomes as parallel horizontal tracks, with similarity results drawn between adjacent tracks. This is the main environment for genome annotation.

4.1 Adding Sequences

4.2 Running Comparisons

4.3 Display Controls

4.4 ORF / CDS Annotation

4.5 Export

4.6 2D Comparison Mode

5. Contig Alignment / コンティグアライメント

This module orders and orients draft genome contigs (assembled from short reads) relative to a complete reference genome, and helps design primers to close sequence gaps.

5.1 Setup

5.2 Aligning Contigs

  1. Click "Start BLASTn" to align all contigs to the reference via BLAST.
  2. Pre-computed BLAST results can be loaded with "Load BLASTn" to skip this step.
  3. Contigs are positioned along the reference at their best-hit location.
  4. A separate contig-vs-contig self-alignment can be run to detect overlapping contigs.

5.3 Visualization

5.4 Synteny and Gap Analysis

5.5 Primer Design for Gap Closing

6. ORF Finding and CDS Annotation / ORF検索・CDS注釈

6.1 Finding ORFs

6.2 BLAST-Aided Annotation

6.3 Registering CDS

6.4 GC Content Hint

7. Start Codon Identification / 開始コドン同定

After an ORF is identified, its precise translation start site is often uncertain. The Start Codon Identification window provides a structured workflow for choosing the correct start codon among multiple candidates.

7.1 Opening the Window

7.2 Candidate Display

7.3 BLAST Comparison Panels

7.4 Decision Buttons

8. Gene Map Drawing / 遺伝子地図描画

The Gene Draw feature produces linear gene maps from GenBank annotation data, suitable for publication figures.

9. Analysis Tools / 解析ツール

9.1 Identity Distribution

Displays a histogram of BLAST alignment identity percentages (0–100%) for a result set. Useful for characterizing the overall similarity between two compared genomes (e.g., bimodal distribution may indicate a mixture of core and accessory genes). The histogram data and image can both be saved to files.

9.2 Dinucleotide Bias Analysis

Calculates dinucleotide composition differences along a sequence using a sliding window. Regions deviating from the genomic average may represent horizontally transferred DNA. Forward and complementary strands can be analyzed separately. Input sequence can be from a file or text field.

9.3 Color Gram

A color-intensity 2D scatter plot of BLAST hit scores across a genomic region. Provides an overview of similarity distribution in a visually compact form. The display range is adjustable; output can be saved as PDF.

9.4 Set Operations — AriNashi / 有り無し

Compares two lists of sequence identifiers (e.g., locus tags, gene names) and finds elements common to both or unique to each. "Ari (有り)" = present; "Nashi (無し)" = absent.

Typical use: identify genes present in one strain but absent from another (candidate strain-specific genes or acquired islands).

9.5 Unique Sequence Finder / PCR Primer Design

Finds sequences unique to a target region and designs PCR primers for strain-specific detection.

9.6 Location Lookup / Motif Search

Searches for a nucleotide motif (short sequence pattern) within a loaded sequence.

10. GenBank Feature Extraction / GenBankファイルからの特徴配列抽出

A utility for parsing GenBank flat files and extracting specific features and sequences in bulk.

11. Color Settings / カラー設定

11.1 Feature Color Mapping

Each GenBank feature key (e.g., "CDS", "rRNA", "tRNA", "repeat_region") is assigned a display color.

11.2 Identity Score Color Gradient

12. Settings and Persistence / 設定と保存

13. Common Workflows / よくある操作手順

Workflow A: Compare two complete genomes

  1. Open the main window.
  2. Load the first genome as X, the second as Y (FASTA or GenBank).
  3. Select "blastn" and click "Run".
  4. The dot plot appears. Toggle grid lines (100 kb) to add coordinate reference.
  5. Use arrow buttons or zoom to explore regions of interest.
  6. Enable the Synteny switch to highlight conserved gene-order blocks.
  7. Enable the tBLASTx switch to overlay protein-level comparisons (reveals more distant similarities).
  8. Click "Save PDF" to export the figure.

Workflow B: Annotate a new bacterial genome

  1. Open the Multiple Genome Comparison window and add the unannotated sequence as a panel.
  2. Add a closely related, annotated reference genome as a second panel.
  3. Run the comparison (blastn or tBLASTx) to see overall similarity.
  4. In the annotation sub-panel, click "Batch ORF find and BLAST" with the reference proteome as the database.
  5. Review each ORF: those with good BLAST hits to known genes are strong CDS candidates.
  6. Click an ORF and "Register CDS" to add it to the annotation.
  7. For uncertain start codons, use "Start Codon Identification": compare upstream sequences and homolog lengths, then click "Accept".
  8. When all ORFs are processed, export the final annotation as FASTA.

Workflow C: Order contigs from a draft genome

  1. Open the Contig Alignment window.
  2. Load the multi-FASTA contig file and a complete reference genome.
  3. Click "Start BLASTn" to position all contigs along the reference.
  4. Review the alignment; contigs appear as colored blocks on the reference ruler.
  5. Use "Combine panels" to group syntenic contigs.
  6. Identify unresolved gaps between contigs.
  7. Click "Find primers" to design PCR primers bridging each gap for finishing sequencing.

Workflow D: Find strain-specific genes (comparative genomics)

  1. Run BLASTp between the two strain proteomes (or use the Multiple Comparison window).
  2. Export the locus tag lists of matched and unmatched CDS.
  3. Open the AriNashi (Set Operations) tool.
  4. Load the CDS list of strain A as set A, strain B as set B.
  5. Run "A − B" to list genes present in A but absent from B — candidate strain-specific genes.

Workflow E: Detect horizontally transferred regions

  1. In the main window, compare the genome of interest (X) against a close relative (Y) with blastn.
  2. Gaps in the dot-plot diagonal indicate sequence not shared with the reference — candidate HGT islands.
  3. Enable the tBLASTx overlay to check whether coding sequences exist in those gaps (may be too divergent for blastn).
  4. Run Dinucleotide Bias Analysis on the X sequence; atypical dinucleotide composition in the gap regions further supports HGT origin.

14. Glossary / 用語集

HSP (High-Scoring Pair)
A local alignment produced by BLAST with a score above the threshold. Each dot or segment in the dot plot represents one HSP.
E-value
Expected number of BLAST hits of equal or better score occurring by chance in the database. Lower E-value = more statistically significant hit. Typical thresholds: 1e-5 (default), 1e-10 (stringent), 1 (very permissive).
Identity (%)
Percentage of aligned positions that are identical between query and subject. In BLAST results, displayed as a number between 0 and 100.
CDS (Coding DNA Sequence)
A region of DNA that encodes a protein, defined by a start codon, a reading frame, and a stop codon. In GenBank format, annotated as a "CDS" feature with qualifiers such as translation, product, and locus_tag.
ORF (Open Reading Frame)
A continuous stretch of codons between a start codon (ATG, GTG, TTG, etc.) and an in-frame stop codon. An ORF is a candidate CDS, but not all ORFs encode functional proteins.
Synteny
Conservation of gene order and content between two genomic regions. Syntenic blocks appear as uninterrupted diagonal lines in the dot plot. Loss of synteny (off-diagonal segments) indicates rearrangements.
Scale / 縮尺 (syukusyaku)
Display scale expressed in screen points per kilobase (pt/kb). A higher value zooms in; a lower value shows more sequence in the same window width.
Display width / 表示幅
The genomic range currently visible in a panel, expressed in kilobases.
Contig
A contiguous sequence assembled from overlapping sequencing reads. Draft (unfinished) genomes typically consist of dozens to thousands of contigs with unknown order and orientation relative to each other.
tBLASTx
A BLAST search mode that translates both query and subject sequences in all six reading frames and compares them at the amino acid level. More sensitive than blastn for detecting distant similarity between coding regions.
Shine-Dalgarno sequence
A ribosome-binding sequence in bacterial mRNA, typically located 5–10 bp upstream of the start codon, complementary to the 3′ end of 16S rRNA. Recognizing a strong Shine-Dalgarno motif (purine-rich; consensus AGGAGG) supports the choice of a nearby start codon.
HGT (Horizontal Gene Transfer / 水平伝播)
Transfer of genetic material between organisms other than from parent to offspring. HGT regions often show gaps in the dot-plot diagonal (absent from the close relative) and atypical nucleotide composition.
Tm (Melting Temperature)
The temperature at which 50% of a DNA duplex is denatured (single-stranded). GenomeMatcher calculates Tm for PCR primer candidates using the Nearest Neighbor thermodynamic method or the simpler Current Protocols empirical formula (Tm = 81.5 + 16.6 × log[Na⁺] + 0.41 × %GC − 675/length).
MUMmer
A whole-genome aligner based on Maximal Unique Matches. Very fast for large (megabase-scale) genomes. Must be installed separately; the executable path is specified in GenomeMatcher preferences.
GenBank format
A flat-file format used by NCBI to distribute annotated nucleotide sequences. Each entry (LOCUS) contains the nucleotide sequence and structured feature annotations (CDS, gene, rRNA, tRNA, etc.) with qualifiers such as product, locus_tag, protein_id, and translation.
GC content
The fraction of guanine and cytosine bases in a DNA sequence, expressed as a percentage. Prokaryotic genomes have characteristic average GC contents (e.g., ~40% in E. coli, ~67% in Streptomyces). Locally elevated or depressed GC content can suggest foreign DNA origin.

15. Frequently Asked Questions / よくある質問

Q: Which comparison method should I use?
A: For closely related genomes (same species or closely related strains), blastn is the standard choice — fast and accurate. For more distantly related genomes (different genera, or when you suspect protein-coding similarity without nucleotide-level identity), use tBLASTx. The Direct method is the fastest but least sensitive; use it only for nearly identical sequences or for a quick preview.
Q: The dot plot shows many short scattered dots rather than a clear diagonal. What does this mean?
A: Short scattered hits often indicate that the two sequences are only distantly related, or that you are comparing non-coding regions. Try switching to tBLASTx to compare at the protein level — coding regions that are too divergent for blastn may still produce a clear diagonal with tBLASTx. You can also lower the E-value threshold to show more hits, or raise it to reduce noise.
Q: There is a diagonal line but with interruptions (gaps). What do the gaps represent?
A: Gaps in the diagonal indicate genomic regions present in one sequence but absent (or highly rearranged) in the other. These are candidate insertions, deletions, or horizontally transferred islands. Enable the tBLASTx overlay to check whether any protein-coding sequences lie in those gap regions.
Q: How do I read the identity color scale?
A: The color of each dot or segment encodes the BLAST alignment identity percentage of that hit. The exact mapping depends on your Color Settings gradient, but by default warmer colors (red/orange) indicate higher identity and cooler colors (blue/green) indicate lower identity. Open the Default Settings panel to view or modify the gradient.
Q: How do I zoom into a specific region of interest?
A: Use the zoom-in button or increase the scale value in the main window. To navigate to a specific position, use the directional arrow buttons. In the Multiple Genome Comparison window, enter the target position in the range field and press return, or use "Search and focus" to jump to a named gene.
Q: How do I start annotating an unannotated genome sequence?
A: Open the Multiple Genome Comparison window, add the unannotated sequence as a panel, then load a closely related annotated genome as a reference panel. In the annotation sub-panel below the new sequence, click "Batch ORF find and BLAST" with the reference proteome as the BLAST database. Review the ORFs with their BLAST hit information and register confirmed genes with "Register CDS". Use the Start Codon Identification window for genes where the correct start codon is uncertain.
Q: During Start Codon Identification, how do I choose the right start codon?
A: Compare three things: (1) the upstream sequence — a strong Shine-Dalgarno motif (purine-rich, ~7 bp upstream) supports a start codon; (2) the translated amino acid sequence length — if a homolog is known, the candidate start that produces a protein of similar length is usually correct; (3) the start codon type — ATG is most common, GTG and TTG are used but less frequent. Click the candidate row to select it (it turns gray), then click "Accept".
Q: What is the difference between "Register CDS" and "Accept" in Start Codon Identification?
A: "Register CDS" (in the annotation panel) adds an ORF to the annotation with its current predicted start codon. "Accept" (in the Start Codon Identification window) is used when you have explicitly reviewed multiple start codon candidates and chosen one — it finalizes the start position and removes the ORF from the identification queue. For ORFs where the start codon is obvious, "Register CDS" alone is sufficient.
Q: How do I compare a draft genome (many contigs) to a reference?
A: Use the Contig Alignment window (not the main comparison window). Load your multi-FASTA contig file and a complete reference genome, then click "Start BLASTn". Contigs will be positioned along the reference ruler according to their best hit. This shows the overall genome organization and highlights gaps that need closing.
Q: How do I change the colors used for CDS, rRNA, and tRNA arrows?
A: Open the Color Setting panel. Each row maps a feature key (e.g., "CDS", "rRNA", "tRNA") to a color. Click the color swatch in any row to open the color picker and change it. Add new key-color pairs with the "Add" button. Changes take effect immediately in the visualization.
Q: How do I save my analysis and return to it later?
A: In the Multiple Genome Comparison window, use "Save panels" to write the current panel layout (sequences, results, annotations) to a file. Reload it in a future session with "Load panels". Color settings and application preferences are saved automatically to ~/Library/Application Support/GenomeMatcher/.
Q: Can I use GenomeMatcher for eukaryotic genomes?
A: GenomeMatcher is designed and optimized for prokaryotic (bacterial and archaeal) genomes. It will run on eukaryotic sequences technically, but features such as ORF finding use bacterial genetic code assumptions (alternative start codons GTG/TTG, no introns), so annotation results will not be appropriate for eukaryotes. For eukaryotic genome comparison (dot-plot only), MUMmer mode may work for large genomes.
Q: How do I export a publication-quality figure?
A: Click "Save PDF" in any window. GenomeMatcher produces true vector PDF files, which can be imported into Adobe Illustrator, Inkscape, or other vector editors for final figure preparation. For the Gene Draw window, the "Batch save" option outputs multiple regions as separate PDF files automatically.

This reference document is provided to assist AI assistants in answering user questions about GenomeMatcher. Developer: Yoshiyuki Ohtsubo, Laboratory of Microbial Genomics, Tohoku University.

Author Profile: In addition to advancing research on environmental bacteria, the author has created and published various software.