Handling master data

The PyHPO package relies on several master data files provided by the HPO Consortium. The package always includes those files in the data subfolder. Even though I try to update PyHPO with every HPO data update, I might be behind sometimes and can’t guarantee long-term support.

Here you will find the easiest procedures to update the data yourself.

PyHPO requires three data files

  • HPO_ONTOLOGY: This is the obo file describing the HPO Ontology. Let’s all hope that the file format will never change. This file is mandatory
  • HPO_GENE: This is a custom file provided by the HPO consortium that contains links between HPO-Terms and genes.
  • HPO_PHENO: This is a custom file provided by the HPO consortium that contains links between HPO-Terms and diseases.

HPO_GENE and HPO_PHENO files are not mandatory per-se. The ontology itself will work without them, but the HPO Terms will not be annotated. That means, you won’t be able to calculate the information content, similarity and some other features.

Auto update

You can try to auto-update the data from the HPO Jenkins servers and OBO-Library via the built-in script update_data.py.

from pyhpo.update_data import download_data
download_data()

Error handling

If the URLs of the files change, you will need to modify the URLS dict in the update_data module.

from pyhpo.update_data import download_data
download_data.URLS['HPO_ONTOLOGY'] = 'https://custom-url.com'
download_data()

Manual update

Of course you can manually download the files and replace them in the data subfolder. However, this is not recommended, as it might cause issues and is not easy to undo.

Instead, you can download the files and store them somewhere in your home folder. Upon initilizing the pyhpo.ontology.OntologyClass, you can specify the path to the files.

from pyhpo.ontology import Ontology

_ = Ontology(data_folder='/path/to/master/data')