fedops blog

Privacy in Computing

Sun 29 July 2018

Gnome Tracker Issues

Posted by fedops in Software   

Modern desktop environments feature functionality that enables users to easily locate files based on metadata as well as contents. Arguably the first successful implementation of a data scraper and integration into the desktop was Apple's Spotlight. Several of the Linux desktop environments feature similar software, among them Gnome's Tracker.

Privacy Implications

By design, Tracker (and pretty much all other indexers) collect information about your files in their own database for later quick reference when you execute searches. By default this information is stored in $HOME/.cache/tracker. From a privacy point of view there are a number of takeaways:

  • potentially sensitive data is stored in a location separate from the actual files. Deleting one doesn't necessarily delete the other.
  • there is a time lag between changing file contents and the Tracker miner coming along to also update the cache.
  • as a combination of the above two, understand the fact that creating a file on disk, then encrypting it and deleting the unencrypted original, can leave behind traces of the unencrypted cleartext for a significant amount of time. In fact, it is up to the actual implementation of the database used for caching whether this information will ever be deleted.

As always, user friendliness comes with tradeoffs.

If you are concerned about this, consider excluding certain areas of your file systems from the indexing process, or disabling (and potentially deinstalling) the entire mechanism.

Note that temporary files on disk pose a risk even without indexing like this. Creating a file and then simply deleting it will leave behind the information contained within the file. Deleting a file is actually just an unlink() call to the operating system, removing the file's entry in the directory structure and marking its blocks on disk as free. Until these blocks are actually overwritten with new files, their information remains recoverable. Therefore, a secure erase function such as shred(1) needs to be employed to actually get rid of the contents.

Operational Notes

From a practical implementation point of view, Tracker usually runs in a very unobtrusive way. It comes with multiple processes which can be viewed using:

$ tracker daemon status
Store:
29 Jul 2018, 16:20:56:    0%  Store                   - Idle 

Miners:
29 Jul 2018, 16:20:56:    1%  File System             - Crawling recursively directory 'file:///home/bla/Pictures' 
29 Jul 2018, 16:20:56:  ✗     Extractor               - Not running or is a disabled plugin
29 Jul 2018, 16:20:56:  ✓     Applications            - Idle 
29 Jul 2018, 16:20:56:  ✓     RSS/ATOM Feeds          - Idle 

The data miner for the file system, called tracker-miner-fs, crawls through the available mounted file system(s) looking for new and changed files, and hands them over to tracker-extract which digs through them for indexable information. These processes are niced to -19, which means they will utilize whatever CPU they might require but will never compete with anything else. This is all fine and dandy and usually works well.

Apart from the desktop integration, users are given a number of commandline functions to query and manipulate data. tracker info <file> displays all info Tracker has amassed about a file. tracker extract <file> causes Tracker to extract whatever it can find in a file.

Apart from the factual file data and metadata, such as for example GPS coordinates and resolution in a photo, arbitrary keywords called tags can be created and assigned to files. For example:

# add a tage to a file, creating the tag if necessary
$ tracker tag -a screenshot Pictures/tracker.png 
Tag was added successfully
  Tagged: file:///home/bla/Pictures/tracker.png
# show all tags on a given file
$ tracker tag Pictures/tracker.png 
file:///home/bla/Pictures/tracker.png
  screenshot
# show all files with a given tag
$ tracker tag -s -t screenshot
Tags (shown by name):
  screenshot 
    file:///home/bla/Pictures/tracker.png

At the time of this post, there doesn't seem to be a mechanism to correlate e.g. darktable tags written to XMP sidecar files with Tracker tags. In fact, Tracker has no predefined way to parse these specific XML files, though given the open nature of the system one could certainly be written.

Problems

Sometimes, tracker-extract will choke on certain files and crash. The whole process will then enter an endless loop trying to re-index the file(s), crash again, and so on. The usual way to identify this is the fact that the extractor periodically climbs to the top of the CPU usage statistics:

$ top

top - 16:23:02 up  2:46,  3 users,  load average: 1.03, 1.03, 0.88
Tasks: 334 total,   4 running, 250 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.1 us,  0.9 sy, 11.3 ni, 83.5 id,  0.0 wa,  0.1 hi,  0.1 si,  0.0 st
KiB Mem : 16300324 total,   157680 free,  2595000 used, 13547644 buff/cache
KiB Swap:  8220668 total,  8141308 free,    79360 used. 12959260 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
10720 bla       39  19 1968536  49412  28052 R  92.4  0.3   0:02.78 tracker-extract

There will also be telltale messages in the system log:

$ journalctl -f
-- Logs begin at Wed 2018-01-10 12:30:34 CET. --
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/something.jpg": 89.19: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/else.pdf": 24.32: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: Could not insert metadata for item "file:///home/bla/Documents/foobar.pdf": 24.32: invalid UTF-8 character
Jul 29 16:24:32 airtuxi.mtnsub.org tracker-extract[10876]: If the error above is recurrent for the same item/ID, consider running "tracker-extract" in the terminal with the TRACKER_VERBOSITY=3 environment variable, and filing a bug with the additional information

This will basically loop indefinitely until tracker execution is stopped via tracker daemon -t. So far these issues seem to be caused by (and limited to) files with embedded oddball UTF-8 characters. Either way it would be useful to add a flag that tells Tracker about an incompatible file and causes it to mark this file as excluded after a number of failed attempts to index it. Cue Graceful Degradation.

There is a bug filed for this and hopefully a fix will be available soon. Meanwhile - if there are only a few files causing problems - it may help to exclude directories with these files from the harvesting operations:

# list all Tracker settings
$ gsettings list-recursively | grep Tracker
[...]
org.freedesktop.Tracker.Miner.Files index-on-battery true
org.freedesktop.Tracker.Miner.Files sched-idle 'first-index'
org.freedesktop.Tracker.Miner.Files ignored-directories ['po', 'CVS', 'core-dumps', 'lost+found']
org.freedesktop.Tracker.Miner.Files crawling-interval -1
[...]
# add "Documents" to exclusions
$ gsettings set org.freedesktop.Tracker.Miner.Files ignored-directories "['some/directory', 'po', 'CVS', 'core-dumps', 'lost+found']"