Python Scripts#

chembl-index#

For creating and querying index tables based in the drug_indication table. These can be used for querying against the drug indications for non-exact matches.

This indexes both the EFO terms and the MeSH terms. Each time the script is run, it will drop and re-build the indexes.

usage: chembl-index [-h] [-c CONFIG] [-T TMP] [--commit-every COMMIT_EVERY]
                    [--chunksize CHUNKSIZE] [-v]
                    [dburl]

Positional Arguments#

dburl

An SQLAlchemy connection URL or filename if using SQLite. If you do not want to put full connection parameters on the cmd-line use the config file (–config) and config section (–config-section) to supply the parameters

Named Arguments#

-c, --config

The location of the config file

Default: “~/.db.cnf”

-T, --tmp

The location of tmp, if not provided will use the system tmp

--commit-every

The commit to the database after every –commit-every rows

Default: 10000

--chunksize

The max rows to keep in memory when sorting

Default: 100000

-v, --verbose

give more output, --vv turns on progress monitoring

Default: 0

Example usage#

To index a ChEMBL data that is under the section heading chembl_latest in the database config file and give progress updates.

chembl-index -vv chembl_latest

chembl-parse-schema#

A utility for parsing ChEMBL schema documentation and producing a table info and a column info file. The schema documentation file can be downloaded from the ChEMBL downloads page

usage: chembl-parse-schema [-h] [-t OUT_TABLE_INFO] [-c OUT_COLUMN_INFO] [-v]
                           schema_doc

Positional Arguments#

schema_doc

The schema documentation file.

Named Arguments#

-t, --out-table-info

The path to the table info output file. If not supplied then a default file name of <ChEMBL_version>_table_info.txt will be used and written to the current directory.

-c, --out-column-info

The path to the column info output file. If not supplied then a default file name of <ChEMBL_version>_column_info.txt will be used and written to the current directory.

-v, --verbose

give more output, -vv will turn on progress monitoring

Default: 0

Example usage#

To parse the schema documentation file and write the column info and table info files to their use the default locations.

chembl-parse-schema schema_documentation.txt

To write to custom column info and table info locations.

chembl-parse-schema --out-table-info ~/chembl_latest_table.txt --out-column-info ~/chembl_latest_column.txt schema_documentation.txt

Output files#

The output columns for the table info and column info files are appropriate for use with the orm-create-src script within sqlalchemy-config.

Table info file#

  1. table_name (string) - The current name of the database table, must not contain any spaces.

  2. new_table_name (string) - A new name for the database table, must not contain any spaces.

  3. class_name (string) - A class name by which the ORM class will be named, usually camel case, must not contain any spaces.

Column info file#

  1. table_name (string) - The current name of the database table, must not contain any spaces. If the table name has been renamed in the table-info file, then this table name must be the new name.

  2. column_name (string) - The name of the column.

  3. dtype (string) - The column data dtype. Currently supported data types are. int, numeric, big_int, small_int, text, varchar, date, datetime, float``.

  4. max_len (integer) - The maximum string length in a column.

  5. is_primary_key (boolean) - Is the column a primary key column.

  6. is_index (boolean) - Is the column indexed.

  7. is_nullable (boolean) - Can the column values be undefined.

  8. is_unique (boolean) - Should the column values be unique.

  9. column_doc (string) - A description for the column.

Known Issues#

None reported.