The chembl-orm#

version: 34.0.0a0

The chembl-orm enables interaction with the ChEMBL database via SQLAlchemy and Python. It contains an object relational mapper (ORM) module and a separate query module with some present queries against ChEMBL.

There is online documentation for chembl-orm.

This is a third party package and I have no affiliation with the EBI or the ChEMBL team. Please note, that I have only tested this against the ChEMBL SQLite version.

Installation instructions#

At present, chembl-orm is undergoing development and no packages exist yet on PyPi. Therefore it is recommended that you install in either of the two ways below.

Installation using conda#

I maintain a conda package in my personal conda channel. To install from this please run:

conda install -c cfin -c bioconda -c conda-forge chembl-orm

There are currently builds for Python v3.8, v3.9, v3.10 for Linux-64 and Mac-osX.

Please keep in mind that all development is carried out on Linux-64 and Python v3.8/v3.9. I do not own a Mac so can’t test on one, the conda build does run some import tests but that is it.

Installation using pip#

You can install using pip from the root of the cloned repository, first clone and cd into the repository root:

git clone git@gitlab.com:cfinan/chembl-orm.git
cd chembl-orm

Install the dependencies:

python -m pip install --upgrade -r requirements.txt

Then install using pip

python -m pip install .

Or for an editable (developer) install run the command below from the root of the repository. The difference with this is that you can just to a git pull to update, or switch branches without re-installing:

python -m pip install -e .

Conda dependencies#

There are also conda yaml environment files in ./resources/conda/envs that have the same contents as requirements.txt but for conda packages, so all the pre-requisites. I use this to install all the requirements via conda and then install the package as an editable pip install.

However, if you find these useful then please use them. There are Conda environments for Python v3.8, v3.9, v3.10.

Run the tests#

After installation you will may to run the tests. If you have cloned the repository you can run the command below from the root of the repository:

  1. Run the tests using pytest ./tests

These are only import tests. If you want to test the actual ORM against a working copy of the database, then you can use orm-test-connect in the SQLAlchemy-config package. This should be installed when you install the ChEMBL ORM. This can be run as follows (using the connection parameters defined in the following section in a ~/.db.ini file):

$ orm-test-connect -vv "chembl_33_mysql" "chembl_orm.orm"
=== orm-test-connect (sqlalchemy_config v0.2.0a3) ===
[info] 10:52:17 > config value: None
[info] 10:52:17 > db value: chembl_mysql
[info] 10:52:17 > module value: chembl_orm.orm
[info] 10:52:17 > verbose value: 2
[info] running queries: 100%|------------| 79/79 [00:00<00:00, 207.43 queries/s]
[info] 10:52:18 > *** END ***

Database configuration#

The package uses SQLAlchemy to handle database interaction. This means that in theory you are not restricted to a particular database backend. In practice most testing/development will be performed against SQLite and MySQL. So, if you use something else and run into issues, please submit an issue or get in contact.

Any database connection parameters can be supplied on the command line or via a configuration file. You can supply a full connection URL directly on the command line. However, this is not a good idea if your database is password protected. In this case you should supply the connection parameters in a connection file .ini file. They should be set out as below:

[chembl_33_sqlite]
# An SQLAlchemy connection URL, see:
# https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.engine_from_config
# All connection options here should be prefixed with umls
# Make sure passwords are URL escaped:
# import urllib.parse
# PW = urllib.parse.quote_plus(PW)
# Also, don't forget to escape any % that are in the URL (with a second %)
db.url = sqlite:////data/chembl_33.db

[chembl_33_mysql]
# Connection to localhost
db.url = mysql+pymysql://user:password@127.0.0.1/chembl_33

[chembl_33_postgres]
# Connection to localhost
db.url = postgresql+psycopg2://://user:password@127.0.0.1/chembl_33

Then to use these from the command line you can supply the section header from the config file to the script, so chembl_33_mysql for the connection to the MySQL database.

Versioning#

The major version of the chembl-orm package is versioned in the same way as the actual ChEMBL release. I will endeavor to keep it current. If you are using an old version of ChEMBL then you should switch branches to the same version that matches the database you have.

Change log#

version 33.0.0a0#

  • Initial build and push

version 33.1.0a0#

  • API - Added a dictionary of drug salts to the example data.

  • API - Added index tables to the ORM schema chembl_orm.orm.TermIndexLookup and chembl_orm.orm.TermIndexMap.

  • API - Added function (chembl_orm.index.build_chembl_index) to index the drug_indications table into separate index table.

  • API - Added instance method for chembl_orm.queries.ChemblQuery.get_drugs_for_indication, that performs an search of the index for drugs matching a supplied indication string.

  • API - Added instance method for chembl_orm.queries.ChemblQuery.map_drug_name, that attempts to map a free text drug name into a ChEMBL ID.

  • SCRIPTS - Added a command line program chembl-index to build the ChEMBL index tables.

version 33.2.0a0#

  • BUILD - Updated to use SQLAlchemy>=2.

  • API - Fixed SQLAlchemy 2 incompatibilities in the ORM.

version 34.0.0a0#

  • API - Updated ORM for ChEMBL 34, two columns added.