Python Scripts#
chembl-index#
For creating and querying index tables based in the drug_indication
table. These can be used for querying against the drug indications for
non-exact matches.
This indexes both the EFO terms and the MeSH terms. Each time the script is run, it will drop and re-build the indexes.
usage: chembl-index [-h] [-c CONFIG] [-T TMP] [--commit-every COMMIT_EVERY]
[--chunksize CHUNKSIZE] [-v]
[dburl]
Positional Arguments#
- dburl
An SQLAlchemy connection URL or filename if using SQLite. If you do not want to put full connection parameters on the cmd-line use the config file (–config) and config section (–config-section) to supply the parameters
Named Arguments#
- -c, --config
The location of the config file
Default: “~/.db.cnf”
- -T, --tmp
The location of tmp, if not provided will use the system tmp
- --commit-every
The commit to the database after every –commit-every rows
Default: 10000
- --chunksize
The max rows to keep in memory when sorting
Default: 100000
- -v, --verbose
give more output,
--vvturns on progress monitoringDefault: 0
Example usage#
To index a ChEMBL data that is under the section heading chembl_latest in the database config file and give progress updates.
chembl-index -vv chembl_latest
chembl-parse-schema#
A utility for parsing ChEMBL schema documentation and producing a table info and a column info file. The schema documentation file can be downloaded from the ChEMBL downloads page
usage: chembl-parse-schema [-h] [-t OUT_TABLE_INFO] [-c OUT_COLUMN_INFO] [-v]
schema_doc
Positional Arguments#
- schema_doc
The schema documentation file.
Named Arguments#
- -t, --out-table-info
The path to the table info output file. If not supplied then a default file name of
<ChEMBL_version>_table_info.txtwill be used and written to the current directory.- -c, --out-column-info
The path to the column info output file. If not supplied then a default file name of
<ChEMBL_version>_column_info.txtwill be used and written to the current directory.- -v, --verbose
give more output, -vv will turn on progress monitoring
Default: 0
Example usage#
To parse the schema documentation file and write the column info and table info files to their use the default locations.
chembl-parse-schema schema_documentation.txt
To write to custom column info and table info locations.
chembl-parse-schema --out-table-info ~/chembl_latest_table.txt --out-column-info ~/chembl_latest_column.txt schema_documentation.txt
Output files#
The output columns for the table info and column info files are appropriate for use with the orm-create-src script within sqlalchemy-config.
Table info file#
table_name(string) - The current name of the database table, must not contain any spaces.new_table_name(string) - A new name for the database table, must not contain any spaces.class_name(string) - A class name by which the ORM class will be named, usually camel case, must not contain any spaces.
Column info file#
table_name(string) - The current name of the database table, must not contain any spaces. If the table name has been renamed in the table-info file, then this table name must be the new name.column_name(string) - The name of the column.dtype(string) - The column data dtype. Currently supported data types are.int,numeric,big_int,small_int,text,varchar,date,datetime, float``.max_len(integer) - The maximum string length in a column.is_primary_key(boolean) - Is the column a primary key column.is_index(boolean) - Is the column indexed.is_nullable(boolean) - Can the column values be undefined.is_unique(boolean) - Should the column values be unique.column_doc(string) - A description for the column.
Known Issues#
None reported.