Python Scripts#
chembl-index
#
For creating and querying index tables based in the drug_indication
table. These can be used for querying against the drug indications for
non-exact matches.
This indexes both the EFO terms and the MeSH terms. Each time the script is run, it will drop and re-build the indexes.
usage: chembl-index [-h] [-c CONFIG] [-T TMP] [--commit-every COMMIT_EVERY]
[--chunksize CHUNKSIZE] [-v]
[dburl]
Positional Arguments#
- dburl
An SQLAlchemy connection URL or filename if using SQLite. If you do not want to put full connection parameters on the cmd-line use the config file (–config) and config section (–config-section) to supply the parameters
Named Arguments#
- -c, --config
The location of the config file
Default: “~/.db.cnf”
- -T, --tmp
The location of tmp, if not provided will use the system tmp
- --commit-every
The commit to the database after every –commit-every rows
Default: 10000
- --chunksize
The max rows to keep in memory when sorting
Default: 100000
- -v, --verbose
give more output,
--vv
turns on progress monitoringDefault: 0
Example usage#
To index a ChEMBL data that is under the section heading chembl_latest
in the database config file and give progress updates.
chembl-index -vv chembl_latest
chembl-parse-schema
#
A utility for parsing ChEMBL schema documentation and producing a table info and a column info file. The schema documentation file can be downloaded from the ChEMBL downloads page
usage: chembl-parse-schema [-h] [-t OUT_TABLE_INFO] [-c OUT_COLUMN_INFO] [-v]
schema_doc
Positional Arguments#
- schema_doc
The schema documentation file.
Named Arguments#
- -t, --out-table-info
The path to the table info output file. If not supplied then a default file name of
<ChEMBL_version>_table_info.txt
will be used and written to the current directory.- -c, --out-column-info
The path to the column info output file. If not supplied then a default file name of
<ChEMBL_version>_column_info.txt
will be used and written to the current directory.- -v, --verbose
give more output, -vv will turn on progress monitoring
Default: 0
Example usage#
To parse the schema documentation file and write the column info and table info files to their use the default locations.
chembl-parse-schema schema_documentation.txt
To write to custom column info and table info locations.
chembl-parse-schema --out-table-info ~/chembl_latest_table.txt --out-column-info ~/chembl_latest_column.txt schema_documentation.txt
Output files#
The output columns for the table info and column info files are appropriate for use with the orm-create-src
script within sqlalchemy-config.
Table info file#
table_name
(string
) - The current name of the database table, must not contain any spaces.new_table_name
(string
) - A new name for the database table, must not contain any spaces.class_name
(string
) - A class name by which the ORM class will be named, usually camel case, must not contain any spaces.
Column info file#
table_name
(string
) - The current name of the database table, must not contain any spaces. If the table name has been renamed in the table-info file, then this table name must be the new name.column_name
(string
) - The name of the column.dtype
(string
) - The column data dtype. Currently supported data types are.int
,numeric
,big_int
,small_int
,text
,varchar
,date
,datetime
, float``.max_len
(integer
) - The maximum string length in a column.is_primary_key
(boolean
) - Is the column a primary key column.is_index
(boolean
) - Is the column indexed.is_nullable
(boolean
) - Can the column values be undefined.is_unique
(boolean
) - Should the column values be unique.column_doc
(string
) - A description for the column.
Known Issues#
None reported.