esis package

Submodules

esis.cli module

Command Line Interface.

esis.cli.clean(args)[source]

Remove all indexed documents.

esis.cli.configure_logging(log_level)[source]

Configure logging based on command line argument.

Parameters:log_level (int) – Log level passed form the command line
esis.cli.count(args)[source]

Print indexed documents information.

esis.cli.index(args)[source]

Index database information into elasticsearch.

esis.cli.main(argv=None)[source]

Entry point for the esis.py script.

esis.cli.parse_arguments(argv)[source]

Parse command line arguments.

Returns:Parsed arguments
Return type:argparse.Namespace
esis.cli.search(args)[source]

Send query to elasticsearch.

esis.cli.valid_directory(path)[source]

Directory validation.

esis.db module

Database related tools.

class esis.db.DBReader(database)[source]

Bases: object

Iterate through all db tables and rows easily.

Parameters:database (esis.db.Database) – Database to traverse
FTS_SUFFIXES = ('content', 'segdir', 'segments', 'stat', 'docsize')
tables()[source]

Generator that traverses all tables in a database.

Returns:Table name
Return type:str
class esis.db.Database(db_filename)[source]

Bases: object

Generic database object.

Parameters:db_filename (str) – Path to the sqlite database file
connect()[source]

Create connection.

disconnect()[source]

Close connection.

reflect(table_names)[source]

Get table metadata through reflection.

sqlalchemy already provides a reflect method, but it will stop at the first failure, while this method will try to get as much as possible.

Parameters:table_names (list(str)) – Table names to inspect
run_quick_check()[source]

Check database integrity.

Some files, especially those files created after carving, might not contain completely valid data.

class esis.db.DatetimeDecorator(*args, **kwargs)[source]

Bases: sqlalchemy.sql.type_api.TypeDecorator

A datetime class that translates data to ISO strings.

The reason ISO strings are used instead of datetime objects or integer timestamps is because is what elasticsearch handles as a datetime value. Internally it seems to store it as an integer timestamp, but that’s transparent to the user.

impl

alias of TEXT

process_result_value(value, _dialect)[source]

Translate datetime/timestamp to ISO string.

class esis.db.IntegerDecorator(*args, **kwargs)[source]

Bases: sqlalchemy.sql.type_api.TypeDecorator

An integer class that translates ‘null’ values to None.

This is needed because some tables use ‘null’ instead of NULL and elastic search fails to index documents with strings where integers should be found.

impl

alias of INTEGER

process_result_value(value, _dialect)[source]

Translate ‘null’ to None if needed.

class esis.db.TableReader(database, table_name)[source]

Bases: esis.db.TypeCoercionMixin

Iterate over all rows easily.

Parameters:
  • database (esis.db.Database) – Database being explored
  • table (sqlalchemy.sql.schema.Table) – Database table
get_schema()[source]

Return table schema.

Returns:Column names and their type
Return type:dict(str, sqlalchemy.types.*)
rows()[source]

Generator that traverses all rows in a table.

Returns:All rows in the table
Return type:generator(sqlalchemy.engine.result.RowProxy)
class esis.db.TypeCoercionMixin[source]

Bases: object

A mixin to transform database values.

This is useful to get safe values from sqlalchemy when data types are not very well defined in SQLite.

COERCIONS = {<class 'sqlalchemy.sql.sqltypes.BOOLEAN'>: <class 'esis.db.IntegerDecorator'>, <class 'sqlalchemy.sql.sqltypes.TIMESTAMP'>: <class 'esis.db.DatetimeDecorator'>, <class 'sqlalchemy.sql.sqltypes.NUMERIC'>: <class 'sqlalchemy.sql.sqltypes.TEXT'>, <class 'sqlalchemy.sql.sqltypes.DATE'>: <class 'esis.db.DatetimeDecorator'>, <class 'sqlalchemy.sql.sqltypes.BIGINT'>: <class 'esis.db.IntegerDecorator'>, <class 'sqlalchemy.sql.sqltypes.INTEGER'>: <class 'esis.db.IntegerDecorator'>, <class 'sqlalchemy.sql.sqltypes.DATETIME'>: <class 'esis.db.DatetimeDecorator'>, <class 'sqlalchemy.sql.sqltypes.SMALLINT'>: <class 'esis.db.IntegerDecorator'>}

esis.es module

Elasticsearch related funcionality.

class esis.es.Client(host, port)[source]

Bases: object

Elasticsearch client wrapper.

Parameters:
  • host (str) – Elasticsearch host
  • port (int) – Elasticsearch port
INDEX_NAME = 'sqlite'
clean()[source]

Remove all indexed documents.

count()[source]

Return indexed documents information.

Returns:Indexed documents information
Return type:dict
index(directory)[source]

Index all the information available in a directory.

In elasticsearch there will be an index for each database and a document type for each table in the database.

Parameters:directory (str) – Directory that should be indexed
search(query)[source]

Yield all documents that match a given query.

Parameters:query (str) – A simple query with data to search in elasticsearch
Returns:Records that matched the query as returned by elasticsearch
Return type:list(dict)
class esis.es.Mapping(document_type, table_schema)[source]

Bases: object

ElasticSearch mapping.

Parameters:
  • document_type (str) – Document type user for the database table
  • table_schema (dict(str, sqlalchemy.types.*)) – Database table schema from sqlalchemy
SQL_TYPE_MAPPING = {<class 'sqlalchemy.sql.sqltypes.SMALLINT'>: 'integer', <class 'sqlalchemy.sql.sqltypes.DATETIME'>: 'date', <class 'sqlalchemy.sql.sqltypes.VARCHAR'>: 'string', <class 'sqlalchemy.sql.sqltypes.CHAR'>: 'string', <class 'sqlalchemy.sql.sqltypes.CLOB'>: 'string', <class 'sqlalchemy.sql.sqltypes.NullType'>: None, <class 'sqlalchemy.sql.sqltypes.BIGINT'>: 'long', <class 'sqlalchemy.sql.sqltypes.NCHAR'>: 'string', <class 'sqlalchemy.sql.sqltypes.INTEGER'>: 'long', <class 'sqlalchemy.sql.sqltypes.REAL'>: 'double', <class 'sqlalchemy.sql.sqltypes.BOOLEAN'>: 'boolean', <class 'sqlalchemy.sql.sqltypes.TIMESTAMP'>: 'date', <class 'sqlalchemy.sql.sqltypes.NUMERIC'>: None, <class 'sqlalchemy.sql.sqltypes.DECIMAL'>: None, <class 'sqlalchemy.sql.sqltypes.NVARCHAR'>: 'string', <class 'sqlalchemy.sql.sqltypes.FLOAT'>: 'float', <class 'sqlalchemy.sql.sqltypes.TIME'>: 'date', <class 'sqlalchemy.sql.sqltypes.DATE'>: None, <class 'sqlalchemy.sql.sqltypes.TEXT'>: 'string'}
esis.es.get_document(db_filename, table_name, row)[source]

Get document to be indexed from row.

Parameters:
  • db_filename (str) – Path to the database file
  • table_name – Database table name
  • row (sqlalchemy.engine.result.RowProxy) – Database row
esis.es.get_index_action(index_name, document_type, document)[source]

Generate index action for a given document.

Parameters:
  • index_name (str) – Elasticsearch index to use
  • document_type – Elasticsearch document type to use
  • document – Document to be indexed
Returns:

Action to be passed in bulk request

Return type:

dict

esis.fs module

Filesystem functionality.

class esis.fs.TreeExplorer(directory, blacklist=None)[source]

Bases: object

Look for sqlite files in a tree and return the valid ones.

Parameters:
  • directory (str) – Base directory for the tree to be explored.
  • blacklist (list(str)) – List of relative directories to skip
paths()[source]

Return paths to valid databases found under directory.

Returns:Paths to valid databases
Return type:list(str)

esis.util module

Utility functionality.

esis.util.datetime_to_timestamp(datetime_obj)[source]

Return a timestamp for the given datetime object.

Parameters:datetime_obj (datetime.datetime) – datetime object to be converted
Returns:timestamp from the passed datetime object
Return type:int

Module contents

Elastic Search Index & Search.