Python Scripts#

`chembl-index`#

For creating and querying index tables based in the drug_indication table. These can be used for querying against the drug indications for non-exact matches.

This indexes both the EFO terms and the MeSH terms. Each time the script is run, it will drop and re-build the indexes.

usage: chembl-index [-h] [-c CONFIG] [-T TMP] [--commit-every COMMIT_EVERY]
                    [--chunksize CHUNKSIZE] [-v]
                    [dburl]

Positional Arguments#

dburl: An SQLAlchemy connection URL or filename if using SQLite. If you do not want to put full connection parameters on the cmd-line use the config file (–config) and config section (–config-section) to supply the parameters

Named Arguments#

-c, --config

The location of the config file

Default: “~/.db.cnf”

-T, --tmp

The location of tmp, if not provided will use the system tmp

--commit-every

The commit to the database after every –commit-every rows

Default: 10000

--chunksize

The max rows to keep in memory when sorting

Default: 100000

-v, --verbose

give more output, --vv turns on progress monitoring

Default: 0

Example usage#

To index a ChEMBL data that is under the section heading chembl_latest in the database config file and give progress updates.

chembl-index -vv chembl_latest

`chembl-parse-schema`#

A utility for parsing ChEMBL schema documentation and producing a table info and a column info file. The schema documentation file can be downloaded from the ChEMBL downloads page

usage: chembl-parse-schema [-h] [-t OUT_TABLE_INFO] [-c OUT_COLUMN_INFO] [-v]
                           schema_doc

Positional Arguments#

schema_doc: The schema documentation file.

Named Arguments#

-t, --out-table-info

The path to the table info output file. If not supplied then a default file name of <ChEMBL_version>_table_info.txt will be used and written to the current directory.

-c, --out-column-info

The path to the column info output file. If not supplied then a default file name of <ChEMBL_version>_column_info.txt will be used and written to the current directory.

-v, --verbose

give more output, -vv will turn on progress monitoring

Default: 0

Example usage#

To parse the schema documentation file and write the column info and table info files to their use the default locations.

chembl-parse-schema schema_documentation.txt

To write to custom column info and table info locations.

chembl-parse-schema --out-table-info ~/chembl_latest_table.txt --out-column-info ~/chembl_latest_column.txt schema_documentation.txt

Output files#

The output columns for the table info and column info files are appropriate for use with the orm-create-src script within sqlalchemy-config.

Table info file#

table_name (string) - The current name of the database table, must not contain any spaces.
new_table_name (string) - A new name for the database table, must not contain any spaces.
class_name (string) - A class name by which the ORM class will be named, usually camel case, must not contain any spaces.

Column info file#

table_name (string) - The current name of the database table, must not contain any spaces. If the table name has been renamed in the table-info file, then this table name must be the new name.
column_name (string) - The name of the column.
dtype (string) - The column data dtype. Currently supported data types are. int, numeric, big_int, small_int, text, varchar, date, datetime, float``.
max_len (integer) - The maximum string length in a column.
is_primary_key (boolean) - Is the column a primary key column.
is_index (boolean) - Is the column indexed.
is_nullable (boolean) - Can the column values be undefined.
is_unique (boolean) - Should the column values be unique.
column_doc (string) - A description for the column.

Known Issues#

None reported.

Python Scripts#

chembl-index#

Positional Arguments#

Named Arguments#

Example usage#

chembl-parse-schema#

Positional Arguments#

Named Arguments#

Example usage#

Output files#

Table info file#

Column info file#

Known Issues#

`chembl-index`#

`chembl-parse-schema`#