Welcome to ‘namefiles’ documentation!¶
Name-files is an approach for a standardized file naming for multiple files of different sources with equal formatting, which all are related to the same entity.
Installation¶
Install the latest release from pip.
$ pip install namefiles
Concept¶
The filename is defined by 6 parts, which take on different contexts, all being related to one entity.
identifier: The mandatory name (identification) of an entity.
sub_id: A branch of this entity.
source_id: The source from which the file (data) origins.
vargroup: The possibility to state variables.
context: Context of the files content. What is in there, not how it is stored in there. The context must be always accompanied with an extension.
extension: The file extension, which should state the format of the file. How is it stored in there.
All filename parts except the identifier are optional.
Within namefiles
file naming conventions are defined by a JsonSchema using
the python jsonschema module. namefiles
proposes a standard naming convention,
which is used if no custom naming convention is defined.
The ENBF of the namefiles’s naming convention is
filename ::= identifier ["#" sub_id] ["#" source_id] ["#" vargroup] ["." context] ["." extension]
identifier ::= [0-9a-zA-Z-_]{1,36}
sub_id ::= [0-9A-Z]{1,4}
source_id ::= [0-9A-Z]{5,12}
vargroup ::= ("_" var_value])+
var_value ::= [a-zA-Z0-9,.+-\ ]+
context ::= [a-zA-Z]+[0-9a-zA-Z-]+
extention ::= common file extension (.csv, .txt, ...)
Implementation¶
The recommended namefiles.FilenameParts
implements the default namefiles
file naming convention, providing access to each part via properties.
- FilenameParts.identifier
The mandatory entity’s name which relates to multiple files. The identifier is the leading filename part.
Notes
The identifier has a maximum length of 36 characters and can consist of words [a-zA-Z0-9_] with the addition of the hyphen-minus ‘-‘ (U+002D), which should be the default on keyboards.
Its regular expression
^[0-9a-zA-Z-_]+$
Examples
Minimal to maximal identifier examples.
a # At leas 1 character is needed. 1044e098-7bfb-11eb-9439-0242ac130002 # 36 chars allows a UUID
- Returns
str
- FilenameParts.sub_id
The sub id is the first branch of the identifier.
Notes
The sub identifier allows uppercase words without the underscore [A-Z0-9] with a maximum length of 4.
Its regular expression is
^[0-9A-Z-]{1,4}+$
The sub identifier’s task is to distinguish different states of the same context. A context in this term could be different video captures of the same object with multiple cameras or just different file versions.
The sub identifier should be seen as a branch of the identifier. Not a version within a sequence.
Examples
Multiple different video captures of the same object.
ant#CAM0.avi ant#CAM1.avi ant#CAM2.avi
Different children (versions).
a#1 a#1ST a#2ND a#RAW
- Returns
str
- FilenameParts.source_id
The source id states, where this file came from.
Notes
The source identifier allows words without underscores [a-zA-Z0-9] with the addition of the hyphen-minus ‘-‘ (U+002D), which should be the default on keyboards.
Its regular expression is
^[0-9A-Z-]{5-12}+$
The source identifier states different sources, whenever the context would lead to equal filenames. it might be the name of the program or device which made this file.
Examples
A comparison of sources onto 2 different sub versions of Zeb-a.
Zeb-a#1#canon.jpg Zeb-a#2#canon.jpg Zeb-a#1#nikon.jpg Zeb-a#2#nikon.jpg
- Returns
str
- FilenameParts.vargroup
The group of variables (vargroup) contains meta attributes.
Notes
Each variable of the group is a string. It allows words [a-zA-Z0-9_] with the addition of:
‘-‘ hyphen-minus (U+002D)
‘+’ plus
‘,’ comma
‘.’ dot
Its regular expression is
^#(_[a-zA-Z0-9+-,. ]+)+$
Examples in which meta attributes are stored in the filename:
number of a subsequent sequence e.g. image sequences
a date neither being the creation nor the change date
Examples
>>> from namefiles import FilenameParts >>> FilenameParts.disassemble("Zeb-a#_000000_ffffff_1.9m_no color").vargroup ['000000', 'ffffff', '1.9m', 'no color']
- Returns
List[str]
- FilenameParts.context
Context of the file’s content. What is this file about?
Notes
The context allows words without underscores [a-zA-Z0-9] starting with alphabetic character.
Its regular expression is
^[a-zA-Z]+[0-9a-zA-Z-]+$
While the file extension just states the formatting of the file like ‘.txt’ being a text file or ‘.csv’ being a specifically formatted text file, they do not state any information about their context.
- Returns
str
- FilenameParts.extension
The common file extension with a leading dot.
Notes
The extension states how the content is encoded and which structure it has.
Examples
A file ending with ‘.txt’ is a plain text file, which is encoded with ‘utf-8’ in best case.
A file ending with ‘.csv’ is a plain text file, which contains a table having ‘comma seperated values’. Other examples are common formats like .json or .yml.
Instead of creating non-common file endings for custom text based file formats. The text files should end with ‘.txt’. To state the custom content the context file part can be used.
- Returns
str