Welcome to ‘namefiles’ documentation!¶

Name-files is an approach for a standardized file naming for multiple files of different sources with equal formatting, which all are related to the same entity.

Installation¶

Install the latest release from pip.

$ pip install namefiles

Concept¶

The filename is defined by 6 parts, which take on different contexts, all being related to one entity.

identifier: The mandatory name (identification) of an entity.
sub_id: A branch of this entity.
source_id: The source from which the file (data) origins.
vargroup: The possibility to state variables.
context: Context of the files content. What is in there, not how it is stored in there. The context must be always accompanied with an extension.
extension: The file extension, which should state the format of the file. How is it stored in there.

All filename parts except the identifier are optional.

Within namefiles file naming conventions are defined by a JsonSchema using the python jsonschema module. namefiles proposes a standard naming convention, which is used if no custom naming convention is defined.

The ENBF of the namefiles’s naming convention is

filename     ::= identifier ["#" sub_id] ["#" source_id] ["#" vargroup] ["." context] ["." extension]
identifier   ::= [0-9a-zA-Z-_]{1,36}
sub_id       ::= [0-9A-Z]{1,4}
source_id    ::= [0-9A-Z]{5,12}
vargroup     ::= ("_" var_value])+
var_value    ::= [a-zA-Z0-9,.+-\ ]+
context      ::= [a-zA-Z]+[0-9a-zA-Z-]+
extention    ::= common file extension (.csv, .txt, ...)

Implementation¶

The recommended namefiles.FilenameParts implements the default namefiles file naming convention, providing access to each part via properties.

FilenameParts.identifier

The mandatory entity’s name which relates to multiple files. The identifier is the leading filename part.

Notes

The identifier has a maximum length of 36 characters and can consist of words [a-zA-Z0-9_] with the addition of the hyphen-minus ‘-‘ (U+002D), which should be the default on keyboards.

Its regular expression ^[0-9a-zA-Z-_]+$

Examples

Minimal to maximal identifier examples.

a                                       # At leas 1 character is needed.
1044e098-7bfb-11eb-9439-0242ac130002    # 36 chars allows a UUID

Returns: str

FilenameParts.sub_id

The sub id is the first branch of the identifier.

Notes

The sub identifier allows uppercase words without the underscore [A-Z0-9] with a maximum length of 4.

Its regular expression is ^[0-9A-Z-]{1,4}+$

The sub identifier’s task is to distinguish different states of the same context. A context in this term could be different video captures of the same object with multiple cameras or just different file versions.

The sub identifier should be seen as a branch of the identifier. Not a version within a sequence.

Examples

Multiple different video captures of the same object.

ant#CAM0.avi
ant#CAM1.avi
ant#CAM2.avi

Different children (versions).

a#1
a#1ST
a#2ND
a#RAW

Returns: str

FilenameParts.source_id

The source id states, where this file came from.

Notes

The source identifier allows words without underscores [a-zA-Z0-9] with the addition of the hyphen-minus ‘-‘ (U+002D), which should be the default on keyboards.

Its regular expression is ^[0-9A-Z-]{5-12}+$

The source identifier states different sources, whenever the context would lead to equal filenames. it might be the name of the program or device which made this file.

Examples

A comparison of sources onto 2 different sub versions of Zeb-a.

Zeb-a#1#canon.jpg
Zeb-a#2#canon.jpg
Zeb-a#1#nikon.jpg
Zeb-a#2#nikon.jpg

Returns: str

FilenameParts.vargroup

The group of variables (vargroup) contains meta attributes.

Notes

Each variable of the group is a string. It allows words [a-zA-Z0-9_] with the addition of:

‘-‘ hyphen-minus (U+002D)

‘+’ plus

‘,’ comma

‘.’ dot

Its regular expression is ^#(_[a-zA-Z0-9+-,. ]+)+$

Examples in which meta attributes are stored in the filename:

number of a subsequent sequence e.g. image sequences

a date neither being the creation nor the change date

Examples

>>> from namefiles import FilenameParts
>>> FilenameParts.disassemble("Zeb-a#_000000_ffffff_1.9m_no color").vargroup
['000000', 'ffffff', '1.9m', 'no color']

Returns: List[str]

FilenameParts.context

Context of the file’s content. What is this file about?

Notes

The context allows words without underscores [a-zA-Z0-9] starting with alphabetic character.

Its regular expression is ^[a-zA-Z]+[0-9a-zA-Z-]+$

While the file extension just states the formatting of the file like ‘.txt’ being a text file or ‘.csv’ being a specifically formatted text file, they do not state any information about their context.

Returns: str

FilenameParts.extension

The common file extension with a leading dot.

Notes

The extension states how the content is encoded and which structure it has.

Examples

A file ending with ‘.txt’ is a plain text file, which is encoded with ‘utf-8’ in best case.

A file ending with ‘.csv’ is a plain text file, which contains a table having ‘comma seperated values’. Other examples are common formats like .json or .yml.

Instead of creating non-common file endings for custom text based file formats. The text files should end with ‘.txt’. To state the custom content the context file part can be used.

Returns: str

Indices and tables¶

Index