Gnome Tracker Issues
Modern desktop environments feature functionality that enables users to easily locate files based on metadata as well as contents. Arguably the first successful implementation of a data scraper and integration into the desktop was Apple's Spotlight. Several of the Linux desktop environments feature similar software, among them Gnome's Tracker.
Privacy Implications
By design, Tracker (and pretty much all other indexers) collect information
about your files in their own database for later quick reference when you
execute searches. By default this information is stored in
$HOME/.cache/tracker
. From a privacy point of view there are a number of
takeaways:
- potentially sensitive data is stored in a location separate from the actual files. Deleting one doesn't necessarily delete the other.
- there is a time lag between changing file contents and the Tracker miner coming along to also update the cache.
- as a combination of the above two, understand the fact that creating a file on disk, then encrypting it and deleting the unencrypted original, can leave behind traces of the unencrypted cleartext for a significant amount of time. In fact, it is up to the actual implementation of the database used for caching whether this information will ever be deleted.
As always, user friendliness comes with tradeoffs.
If you are concerned about this, consider excluding certain areas of your file systems from the indexing process, or disabling (and potentially deinstalling) the entire mechanism.
Note that temporary files on disk pose a risk even without indexing like this. Creating a file and then simply deleting it will leave behind the information contained within the file. Deleting a file is actually just an unlink() call to the operating system, removing the file's entry in the directory structure and marking its blocks on disk as free. Until these blocks are actually overwritten with new files, their information remains recoverable. Therefore, a secure erase function such as shred(1) needs to be employed to actually get rid of the contents.
Operational Notes
From a practical implementation point of view, Tracker usually runs in a very unobtrusive way. It comes with multiple processes which can be viewed using:
$ tracker daemon status
Store:
29 Jul 2018, 16:20:56: 0% Store - Idle
Miners:
29 Jul 2018, 16:20:56: 1% File System - Crawling recursively directory 'file:///home/bla/Pictures'
29 Jul 2018, 16:20:56: ✗ Extractor - Not running or is a disabled plugin
29 Jul 2018, 16:20:56: ✓ Applications - Idle
29 Jul 2018, 16:20:56: ✓ RSS/ATOM Feeds - Idle
The data miner for the file system, called tracker-miner-fs
, crawls through
the available mounted file system(s) looking for new and changed files, and
hands them over to tracker-extract
which digs through them for indexable
information. These processes are niced to -19, which means they will utilize
whatever CPU they might require but will never compete with anything else. This
is all fine and dandy and usually works well.
Apart from the desktop integration, users are given a number of commandline
functions to query and manipulate data. tracker info <file>
displays all info
Tracker has amassed about a file. tracker extract <file>
causes Tracker to
extract whatever it can find in a file.
Apart from the factual file data and metadata, such as for example GPS coordinates and resolution in a photo, arbitrary keywords called tags can be created and assigned to files. For example:
# add a tage to a file, creating the tag if necessary
$ tracker tag -a screenshot Pictures/tracker.png
Tag was added successfully
Tagged: file:///home/bla/Pictures/tracker.png
# show all tags on a given file
$ tracker tag Pictures/tracker.png
file:///home/bla/Pictures/tracker.png
screenshot
# show all files with a given tag
$ tracker tag -s -t screenshot
Tags (shown by name):
screenshot
file:///home/bla/Pictures/tracker.png
At the time of this post, there doesn't seem to be a mechanism to correlate e.g. darktable tags written to XMP sidecar files with Tracker tags. In fact, Tracker has no predefined way to parse these specific XML files, though given the open nature of the system one could certainly be written.
Problems
Sometimes, tracker-extract
will choke on certain files and crash. The whole
process will then enter an endless loop trying to re-index the file(s), crash
again, and so on. The usual way to identify this is the fact that the extractor
periodically climbs to the top of the CPU usage statistics:
$ top
top - 16:23:02 up 2:46, 3 users, load average: 1.03, 1.03, 0.88
Tasks: 334 total, 4 running, 250 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.1 us, 0.9 sy, 11.3 ni, 83.5 id, 0.0 wa, 0.1 hi, 0.1 si, 0.0 st
KiB Mem : 16300324 total, 157680 free, 2595000 used, 13547644 buff/cache
KiB Swap: 8220668 total, 8141308 free, 79360 used. 12959260 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10720 bla 39 19 1968536 49412 28052 R 92.4 0.3 0:02.78 tracker-extract
There will also be telltale messages in the system log:
$ journalctl -f
-- Logs begin at Wed 2018-01-10 12:30:34 CET. --
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/something.jpg": 89.19: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/else.pdf": 24.32: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/foobar.pdf": 24.32: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information
This will basically loop indefinitely until tracker execution is stopped via
tracker daemon -t
. So far these issues seem to be caused by (and limited to)
files with embedded oddball UTF-8 characters. Either way it would be useful to
add a flag that tells Tracker about an incompatible file and causes it to mark
this file as excluded after a number of failed attempts to index it. Cue
Graceful Degradation.
There is a bug filed for this and hopefully a fix will be available soon. Meanwhile - if there are only a few files causing problems - it may help to exclude directories with these files from the harvesting operations:
# list all Tracker settings
$ gsettings list-recursively | grep Tracker
[...]
org.freedesktop.Tracker.Miner.Files index-on-battery true
org.freedesktop.Tracker.Miner.Files sched-idle 'first-index'
org.freedesktop.Tracker.Miner.Files ignored-directories ['po', 'CVS', 'core-dumps', 'lost+found']
org.freedesktop.Tracker.Miner.Files crawling-interval -1
[...]
# add "Documents" to exclusions
$ gsettings set org.freedesktop.Tracker.Miner.Files ignored-directories "['some/directory', 'po', 'CVS', 'core-dumps', 'lost+found']"