Skip to content
/ MegaGO Public

Calculate semantic distance for sets of Gene Ontology terms

License

Notifications You must be signed in to change notification settings

MEGA-GO/MegaGO

Repository files navigation

published in: Journal of Proteome Research

MegaGO

Calculate semantic distance for sets of Gene Ontology terms.

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

Scripts are written in python 3. One easy way to get started is installing miniconda 3.

On linux:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Installing

Clone the repository:

git clone https://github.com/MEGA-GO/Mega-Go.git

Install package:

cd Mega-Go
pip install -U .

Execute example analysis:

megago sample7.txt sample8.txt

These files can be found here:

How does it work?

MegaGO calculates the similarity between GO terms with the Lin semantic similarity (simLin) metric 1.

where:

  • MICA: most informative common ancestor.
  • IC(goi): information content of the term goi.

The information content of a go term is calculated as follows:

The frequency p of a term go is defined as:

where:

  • c: children of go.
  • N: total number of terms in GO corpus.
  • ngo': number of occurences of a term go' in a reference data set.

To calculate the similarity of two sets of terms, the best match average (BMA)1 is used.

where:

  • m,n: number of terms in set gi and gj, respectively
  • sim(go1i,go2j): similarity between two GO terms

1: Lin, Dekang. 1998. “An Information-Theoretic Definition of Similarity.” In Proceedings of the 15th International Conference on Machine Learning, 296—304.

Interpretation

The relative similarity ranges between 0 and 1.

sim(go1i,go2j) value Interpretation
>0.9 highly similar functions
0.3-0.9 functionally related
<0.3 not functionally similar

License

This project is licensed under the MIT License - see the LICENSE file for details