src.features.utils

Created on Mon Mar 15 14:13:51 2021

@author: Paolo Cozzi <paolo.cozzi@ibba.cnr.it>

class src.features.utils.TqdmToLogger(logger, level=None)[source]

Bases: StringIO

Output stream for TQDM which will output to logger module instead of the StdOut.

__init__(logger, level=None)[source]
buf = ''
flush()[source]

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

level = None
logger = None
write(buf)[source]

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

class src.features.utils.UnknownCountry[source]

Bases: object

Deal with unknown country

__init__()[source]
src.features.utils.camelCase(string: str) str[source]

Convert a string into camel case

Parameters:

string (str) – the string to convert

Returns:

the camel case version of the string

Return type:

str

src.features.utils.find_duplicates(header: list) list[source]

Find duplicate columns in list. Returns index to remove after the first occurence

Parameters:

header (list) – a list like the header read from a CSV file

Returns:

a list of index (numeric)

Return type:

list

src.features.utils.get_interim_dir() PosixPath[source]

Return smarter data temporary dir

Returns:

the smarter data temporary dir

Return type:

pathlib.PosixPath

src.features.utils.get_processed_dir() PosixPath[source]

Return smarter data processed dir (final processed data)

Returns:

the smarter data final processed dir

Return type:

pathlib.PosixPath

src.features.utils.get_project_dir() PosixPath[source]

Return smarter project dir (which are three levels upper from the module in which this function is stored)

Returns:

the smarter project base dir

Return type:

pathlib.PosixPath

src.features.utils.get_raw_dir() PosixPath[source]

Return smarter data raw dir

Returns:

the smarter data raw directory

Return type:

pathlib.PosixPath

src.features.utils.sanitize(word: str, chars=['.', ',', '-', '/', '#'], check_mongoengine=True) str[source]

Sanitize a word by removing unwanted characters and lowercase it.

Parameters:
  • word (str) – the word to sanitize

  • chars (list) – a list of characters to remove

  • check_mongoengine (bool) – true to add ‘_’ after a mongoengine reserved word

Returns:

the sanitized word

Return type:

str

src.features.utils.skip_comments(handle: TextIOWrapper, comment_char='#') Tuple[int, List[str]][source]

Ignore comments lines from a open file handle. Return the stream position immediately after the comments and all the comment lines in a list.

Parameters:
  • handle (io.TextIOWrapper) – An open file handle.

  • comment_char (TYPE, optional) – The comment character used in file. The default is “#”.

Returns:

The stream position after the comments and the ignored lines as a list.

Return type:

Tuple[int, List[str]]

src.features.utils.text_or_gzip_open(path: str, mode: str = None) TextIOWrapper[source]

Open a file which can be compressed or not. Returns file handle