src.features.utils

Created on Mon Mar 15 14:13:51 2021

@author: Paolo Cozzi <paolo.cozzi@ibba.cnr.it>

class src.features.utils.TqdmToLogger(logger, level=None)[source]

Bases: StringIO

Output stream for TQDM which will output to logger module instead of the StdOut.

__init__(logger, level=None)[source]
buf = ''
flush()[source]

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

level = None
logger = None
write(buf)[source]

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

class src.features.utils.UnknownCountry[source]

Bases: object

Deal with unknown country

__init__()[source]
src.features.utils.camelCase(string: str) str[source]

Convert a string into camel case

Parameters

string (str) – the string to convert

Returns

the camel case version of the string

Return type

str

src.features.utils.find_duplicates(header: list) list[source]

Find duplicate columns in list. Returns index to remove after the first occurence

Parameters

header (list) – a list like the header read from a CSV file

Returns

a list of index (numeric)

Return type

list

src.features.utils.get_interim_dir() PosixPath[source]

Return smarter data temporary dir

Returns

the smarter data temporary dir

Return type

pathlib.PosixPath

src.features.utils.get_processed_dir() PosixPath[source]

Return smarter data processed dir (final processed data)

Returns

the smarter data final processed dir

Return type

pathlib.PosixPath

src.features.utils.get_project_dir() PosixPath[source]

Return smarter project dir (which are three levels upper from the module in which this function is stored)

Returns

the smarter project base dir

Return type

pathlib.PosixPath

src.features.utils.get_raw_dir() PosixPath[source]

Return smarter data raw dir

Returns

the smarter data raw directory

Return type

pathlib.PosixPath

src.features.utils.sanitize(word: str, chars=['.', ',', '-', '/', '#'], check_mongoengine=True) str[source]

Sanitize a word by removing unwanted characters and lowercase it.

Parameters
  • word (str) – the word to sanitize

  • chars (list) – a list of characters to remove

  • check_mongoengine (bool) – true to add ‘_’ after a mongoengine reserved word

Returns

the sanitized word

Return type

str

src.features.utils.skip_comments(handle: ~_io.TextIOWrapper, comment_char='#') -> (<class 'int'>, <class 'list'>)[source]

Ignore comments lines from a open file handle. Return the stream position immediately after the comments and all the comment lines in a list.

Parameters
  • handle (io.TextIOWrapper) – An open file handle.

  • comment_char (TYPE, optional) – The comment character used in file. The default is “#”.

Returns

The stream position after the comments and the ignored lines as a list.

Return type

(int, list)

src.features.utils.text_or_gzip_open(path: str, mode: Optional[str] = None) TextIOWrapper[source]

Open a file which can be compressed or not. Returns file handle