1. Installation
1.1 Linux Installation
1.2 Windows Installation
1.3 Helper SQL Script Installation
2. RitmarkFS concept
3. Table Naming Rules
4. Drivers
4.1 XML Driver
5. FILESYSTEM Table structure
6. LEGACY_FILESYSTEM Table Type
7. Replication and Directory Changes Tracking
8. Some HOWTOs (was Apllication Examples)
9. Feedback
1. Installation
|
Important: FILESYSTEM tables from v.0.3.4 and are incompatible with v.0.4.0 and above
and will crash them. Before upgrading you'll need to drop your FILESYSTEM tables (dropping a table will
not affect your data in files) and recreate after upgrade with LEGACY_FILESYSTEM engine. You can create tables
either manually or using mysql.ritmarkfs_create_legacy routine.
|
1.1 Linux Installation
On Linux RitmarkFS engine is statically linked into MySQL server so to make it available you need to build and
install a modified server version. You can download sources for RitmarkFS-enabled MySQL server
from RitmarkFS download page
(mysql-5.1.9-ritmarkfs-0.4.0-src.tar.bz2).
You don't need any special configure flags to enable RitmarkFS. Basically you need to run
shell# ./configure
shell# make
shell# make install
1.2 Windows Installation
On Windows RitmarkFS engine is statically linked into MySQL server so to make it available you need to install
a modified server version. You can either build it from sources or download a pre-built binary.
Source installation
Download RitmarkFS-enabled MySQL server from
RitmarkFS download page
(mysql-5.1.9-ritmarkfs-0.4.0-src.zip). Open mysql.sln file in MS Visual Studio (you need MSVC 7.1) and build solution.
Binary installation
Download binaries from RitmarkFS download page.
and unpack it somewhere (mysql-ritmarkfs-0.4.0-win32.zip).
1.3 Helper SQL Script Installation
The script creates ritmarkfs_config table, which is currently used to specify directory changes tracking options.
The script also adds ritmarkfs_create_table and ritmarkfs_create_legacy stored procedures which make creation of ritmarkfs
tables less complicated.
The routines take table name as the only parameter. Of course you can create ritmarkfs tables with usual DDL commands, but these
procedures make it unnecessary to remember the correct strcture of tables. The ritmarkfs_config table, ritmarkfs_create_table and
ritmarkfs_create_legacy routine are created in the 'mysql' system schema.
mysql> source ritmarkfs_config.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.05 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
2. RitmarkFS concept
The concept behind the RitmarkFS is that the whole filesystem can be represented in a form of a relational table.
Every file is viewed as a set of properties. Every property has name and value. This way every file
is represented by zero or more table rows of form file name-property name-property value.
Format-specific file drivers are responsible for handling particualr file formats. E.g. if you have
a directory with mp3 files and want to get id3 tags from them you need a specific driver for that.
3. Table Naming Rules
The FILESYSTEM engine uses table names to determine how table-to-filesystem mapping should
be done. Basically table name consists of 2 parts: driver name and target directory name.
Driver name consists of ASCII letters and is delimited from directory name by '_' (underscore symbol).
All path delimiters in directory name should be replaced with '$' (dollar symbol).
Some table name examples:
| jpg_c$$ritmarktest |
- |
jpg driver for c:/ritmarktest directory |
| mp3_$home$vkolesnikov$music |
- |
mp3 driver for /home/vkolesnikov/music directory |
The LEGACY_FILEYSTEM engine doesn't use any drivers, so you need just to specify target directory name.
E.g. c$$ritmarktest.
4. Drivers
Structured file access is implemented based on format-specific drivers. Every driver supports
some specific file format such as MP3 ID3 or (in a future release) XML. There's one to one relation
between tables and drivers. This means that if you want to use several different drivers on the
same directory you need to create a table for every driver. Currently all drivers are read-only.
The following drivers are supported:
Driver abbreviation (for use in table name) |
Supported file types |
Operations |
Description |
| atb |
Any |
SELECT |
Driver for general file attributes. Provides info about file name, size, timestamps, etc |
| xml |
XML files |
SELECT |
Driver provides structured access to XML documents. Uses libxml2.
|
| mp3 |
MP3 files |
SELECT |
Driver provides information about MP3 audiostream from generic attributes and ID3v1 tags.
RitmarkFS uses TagLib to process MP3 tags.
|
| jpg |
JPG files |
SELECT |
Driver provides information from EXIF tags of JPG images.
RitmarkFS uses LibEXIF to process EXIF tags.
|
File format detection is driver specific but usually is based on file extension. This is the fastest way.
Files not supported by driver are ignored. So if you use mp3 driver on a directory with no .mp3 files
empty set will be returned. One exception here is XML driver which performs basic file format validation.
4.1 XML Driver
XML driver represents XML document as a set of key/value pairs where key is path to entity or attribute
(similar to xpath but simpler without indexing, expressions, etc)
and value represents the entitty itself. For example the following small XML document
<?xml version="1.0"?>
<doc att1="1">
<subtag subatt1="s1">text value</subtag>
</doc>
will be represented this way
mysql> select * from xml_c$$ritmarktest where filename = "1.xml";
+------+----------+-----------+---------------------+------------------------+------------+
| path | filename | spathtype | spath | valtype | value |
+------+----------+-----------+---------------------+------------------------+------------+
| | 1.xml | path | | element | doc |
| | 1.xml | path | doc/att1/ | attribute | 1 |
| | 1.xml | path | doc/ | significant-whitespace | |
| | 1.xml | path | doc/ | element | subtag |
| | 1.xml | path | doc/subtag/subatt1/ | attribute | s1 |
| | 1.xml | path | doc/subtag/ | text | text value |
| | 1.xml | path | doc/ | significant-whitespace | |
+------+----------+-----------+---------------------+------------------------+------------+
7 rows in set (0.08 sec)
Unlike other entities XML attribute names never appear in the 'value' column. This allows
to represent an attribute as a single row. Also attribute rows always immediately follow
parent element row. This sometimes makes XML processing much easier.
An important thing to notice is that this representation preserves all information,
so that the original document can be imported into database and later restored back to file.
There's an utility available called dump[.exe] to dump XML data from relational representation
back into XML file. Using read-only XML driver and this utility it is possible to
modify XML files using SQL. The typical workflow could look like:
1. import XML into a HEAP/MyISAM/etc table using INSERT ... SELECT * FROM xml_$datadir;
2. INSERT/UPDATE/DELETE data in table
2a. optional: for very large docs you might want to extract a part of
document into a separate table, and merge back after processing
3. dump data into XML file (possibly on demand)
This scenario could be very effective especially for large XML datasets, as
modifying a row in a table is much faster than modifying (a large) XML file on disk.
5. FILESYSTEM Table Structure
All FILESYSTEM tables have constant driver-independent structure:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| path | varchar(191) | NO | | | |
| filename | varchar(64) | NO | | | |
| spathtype | text | YES | | NULL | |
| spath | text | YES | | NULL | |
| valtype | text | YES | | NULL | |
| value | text | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
path and filename fields represent path to file and its name. spath and value fields
used to represent format-specific file data in key-value format. spathtype and valtype
fields are used by driver to provide type info for data stored in spath and value fields.
Lets look at an example:
mysql> select * from mp3_c$$ritmarktest;
+------+------------------------------------------+---------------+--------------+---------+---------------------+
| path | filename | spathtype | spath | valtype | value |
+------+------------------------------------------+---------------+--------------+---------+---------------------+
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | length | numeric | 346 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | bitrate | numeric | 220 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | samplerate | numeric | 44100 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | channels | numeric | 2 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | version | text | 1.0 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | layer | numeric | 3 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | channel-mode | text | Stereo |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | copyrighted | boolean | N |
| | 01-two_gallants-las_cruces_jail-just.mp3 | mpeg-property | original | boolean | N |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | title | text | Las Cruces Jail |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | artist | text | Two Gallants |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | album | text | What the Toll Tells |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | comment | text | - |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | genre | text | Indie |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | year | numeric | 2006 |
| | 01-two_gallants-las_cruces_jail-just.mp3 | id3v1-tag | track | numeric | 1 |
+------+------------------------------------------+---------------+--------------+---------+---------------------+
16 rows in set (0.28 sec)
As you can see mp3 driver found 16 informational properties in this file. All properties
are classified in a driver-specific way - some are mpeg core properties (spathtype == mpeg-property)
while others are ID3v1 tags (spathtype == id3v1-tag). Values not like keys are of common
datatypes - text, numbers, booleans, etc.
6. LEGACY_FILESYSTEM Table Type
Starting from version 0.4.0 FILESYSTEM table format was changed to support structured
file access. The old (incompatible) table format is now available as LEGACY_FILESYSTEM engine.
The legacy engine supports some operations that are unavailable at the moment
in the new FILESYSTEM engine. These include data modification via INSERT, UPDATE, DELETE
and replication of external events via directory changes tracking.
Table names for LEGACY_FILESYSTEM consist only of target directory name where path delimiters
are replaced with '$' (dollar) symbol. E.g. c$$ritmarktest or $tmp$ritmarktest.
If you installed helper scripts you can use
mysql.ritmarkfs_create_legacy routine to create LEGACY_FILESYSTEM tables.
7. Replication and Directory Changes Tracking
Starting from version 0.3.0 RitmarkFS implements directory changes tracking.
All tracked changes are written into the binlog and can be replicated to other nodes.
RitmarkFS replication setup consists of usual MySQL replication setup steps, plus some
additional operations.
First you need to perform replication setup as described in MySQL manual.
Next create source and target tables. E.g.:
mysql> call mysql.ritmarkfs_create_legacy('test.c$$dir1');
Query OK, 0 rows affected (0.05 sec)
At the moment external replication is supported only for LEGACY_FILESYSTEM engine.
Remember that source and target tables should have the same name. This means that source
and target paths should be equal. If this is not desired you can use views for source
and/or target. For example to replicate directory c:/dir1 to c:/dir2 in target database
create view 'c$$dir1' and map it to the table coresponding to c:/dir2:
mysql> CREATE VIEW test.`c$$dir1` AS SELECT * FROM test.`c$$dir2`;
Next there is a RitmarkFS confguration table, which was created when you performed
helper SQL script installation. The table has the following
structure:
mysql> desc mysql.ritmarkfs_config;
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| schema_name | varchar(255) | YES | | | |
| table_name | varchar(255) | YES | | | |
| option_name | varchar(255) | YES | | | |
| option_value | varchar(255) | YES | | | |
+--------------+--------------+------+-----+---------+-------+
4 rows in set (0.02 sec)
This table is intended to contain all RitmarkFS configuration options. Every option
has name and value as well as an associated table and schema, to which it applies (for
engine-wide options table and schema should be NULLs). Option names and values can be
both lowercase and uppercase but table and schema names are restricted by MySQL naming
rules and by target operating system restrictions for path names.
At the moment there's only one replication-related option:
REPLICATE
Applies to:TABLE
Values: YES/NO
Description: This option controls whether a table's changes will be written
to the binlog and therefore will be replicated. Currently the value of this option
is read only during startup, so to apply any changes you'll need to restart the
server. In future there will be a trigger that applies the new configutation on the fly.
For example to enable table replication for table `c$$ritmarktest` from schema
`test` you need to run the following query:
mysql> INSERT INTO mysql.ritmarkfs_config VALUES ('test', 'c$$ritmarktest', 'REPLICATE', 'YES');
and restart the server.
If you're going to replicate files larger than 1 MByte, you should probably check
the value of max_allowed_packet server variable on both master and slave. For master
you might also want to check the value of max_binlog_size variable, although usually
it has quite large value.
The replication mechanism is designed to work correctly in the cases when
changes to a table are made concurrently via MySQL interface and by making changes
directly to the underlying directory tree.
9. Feedback
I do my best to keep this documentation up-to-date and consistent but currently
RitmarkFS is under intensive development so it's hard to keep docs in sync especially
being very limited in time. So feel free to contact me by e-mail. I cannot promise
anything but usually I answer the same day or the next day.
|