Check data pools for changes or manipulation

The data pool on storage media doesn't last forever. There is a risk of loss due to natural aging, defects of the data storage but also by mistakes or even intrusions into the system. Therefore, it is part of each responsible system administrator's tasks to audit if the data is intact and whether there have been changes.

In order to prevent write access to a storage medium or directory you can use a write-only medium like a DVD or activate a write protection, for example when using a SD card. Experienced users often mount selected directories as read-only or set their write protection to on (see the "Mount Filesystems as Read-Only box).

Mount Filesystems as Read-Only

The file /etc/fstab lists all partitions the system mounts into the directory tree. In the fourth column of each row, you can define with ro (read-only) or rw (read-write) whether one can only read or also write on to the partition (Listing 1).

Another option is to use the immutable flag [1]. This flag is part of a directory entry's advanced attributes, which only a few Linux users know and even fewer use in practice. If File-based Access Control Lists (FACLs) come into play on top of that, for example in the context of SELinux [2] in distributions like Red Hat, Fedora, or CentOS, one can define access rights even more precisely.

With the immutable flag set, the directory entry cannot be changed any more – the file or the directory is write protected. Each attempt to modify data is denied by the operating system. Only the root user can set and remove this flag for single users. You can do the former by using chattr +i <file> , or the latter with chattr -i <file> . The i in the fourth row of Listing 2 shows that the file example.txt carries the immutable flag.

Listing 1

fstab

$ grep data /etc/fstab
/dev/sdb1  /data  ext4  ro  0  0

Listing 2

Attributes of example.txt

# touch example.txt
# chattr +i example.txt
# lsattr example.txt
----i--------e-- example.txt
# echo "# Comment" >> example.txt
bash: file: Operation not permitted
# chattr -i example.txt

A change in the data pool may happen with regard to either its content by additions and deletions or also data-access privileges. Possible modifications also include adding, renaming, moving and deleting of files, directories, and (symbolic) links. Your concern as a system administrator is to understand at which point in time which modifications happened, which user executed them, and – in case of errors – how you can repair things.

Detecting modifications beforehand including an appropriate reaction to such an incident goes beyond the scope of this article, so I will focus on how you can detect such changes retroactively, after they happen. In the examples below, you'll see the steps to follow with rsync and integrit .

In principle, the procedures can be used with other tools as well (e.g., Tripwire , Aide , and Iwatch ). However the configuration and evaluation of the results will differ one from another.

Rsync

If you have two data pools – for example an original and a backup – you can already make use of the default tool rsync [3] to detect differences between both datasets. Rsync was originally designed to be used to synchronize two directories, and it echoes to the terminal which entries differ from each other.

Listing 3 demonstrates this with the two directories original/ and copy/ . Each contains three initially identical files. But in the copy/ directory, I have modified data. While alright.txt remains unchanged, I set the execution bit for the group for anything.txt and added additional content to somewhat.txt .

Listing 3

Using rsync to find changes

$ ls -la {original,copy}
copy:
total 16
drwxr-xr-x 2 frank frank 4096 Jun  1 14:28 .
drwxr-xr-x 4 frank frank 4096 Jun  1 14:25 ..
-rw-r-xr-- 1 frank frank   15 Jun  1 16:36 alright.txt
-rw-r-xr-- 1 frank frank   10 Jun  1 14:28 anything.txt
-rw-r--r-- 1 frank frank   24 Jun  1 14:30 somewhat.txt
original:
total 16
drwxr-xr-x 2 frank frank 4096 Jun  1 14:26 .
drwxr-xr-x 4 frank frank 4096 Jun  1 14:25 ..
-rw-r-xr-- 1 frank frank   15 Jun  1 16:36 alright.txt
-rw-r--r-- 1 frank frank   10 Jun  1 14:26 anything.txt
-rw-r--r-- 1 frank frank   10 Jun  1 14:26 somewhat.txt
$ rsync -anv --out-format="[%t]:%o:%f:Last Modified %M" copy/* original
sending incremental file list
[2016/06/01 16:40:25]:send:copy/alright.txt:Last Modified 2016/06/01-16:36:14
[2016/06/01 16:40:25]:send:copy/anything.txt:Last Modified 2016/06/01-14:28:49
[2016/06/01 16:40:25]:send:copy/somewhat.txt:Last Modified 2016/06/01-14:30:23
sent 137 bytes  received 25 bytes  324.00 bytes/sec
total size is 34  speedup is 0.21 (DRY RUN)

Rsync allows a dry run using the -n switch (long option --dry-run ). Here you use this mode of operation to detect modification without actually starting a synchronization of both directories. Rsync compares both folders by using -a (long option --archive ) taking into account the names of the existing entries, their size, and the set access permissions.

Without any additional options, rsync behaves a little bit tight-lipped. Only when using -v (long version --verbose ) does it show details of the transactions taken place. The option -v can be set multiple times where necessary to increase the amount of details. The additional switch --out-format defines how rsync comments the details about the data transaction.

In my example, %t prints the transfer's timestamp, %o the action to be executed (send or receive), %f the file name, and %M the timestamp of the last modification (see Table 1 [4]). Additional help for rsync can be obtained from an introductory article [5], as well as the rsync man page.

Table 1

Rsync Format Placeholder

Placeholder Meaning
%a Remote IP address
%b Number of bytes actually transferred
%B Permission bits of the file (e.g., rwxrwxrwt )
%c Total size of the block checksums received for the basis file (only when sending)
%f File name (long form on sender; no trailing "/")
%G GID of the file (decimal) or DEFAULT
%h Remote hostname
%i Itemized list of what is being updated
%l Length of the file in bytes
%L String -> SYMLINK , => HARDLINK , or empty
%m Module name
%M Last-modified time of the file
%n Filename (short form; trailing "/" on dir)
%o Operation (send , recv , or del )
%p PID of the rsync session
%P Module path
%t Current date and time
%u Authenticated username or an empty string
%U UID of the file (decimal)

Because you only care about the modified entries, you can also make use of the combination of rsync -i (long form --info ) and the filter tool grep . From the detailed but still compact output of rsync, you filter out only information that contains modifications. All other lines are dropped.

The output contains one line per file, each of which is preceded by a > . The following 10 characters represent the properties rsync uses to compare the two entries. If there is a dot in any of the positions, there is no difference between the files regarding that property. If there are letters, there is a modification. For example, c stands for checksum, meaning the files' checksums or hash values are different; s indicates different sizes; and p indicates different permissions.

You filter the relevant lines from the output by using grep and an appropriate regular expression. The expression used in Listing 4 matches sequences that start with an f , followed by an arbitrary character, which is followed by either a dot and tp , st and a dot, or three dots. The last two lines contain the matches.

Listing 4

Comparing checksums

$ rsync -acniv copy/* original | grep --color -E "f.(\.tp|st\.|\.\.\.)"
>f..tp..... anything.txt
>fcst...... somewhat.txt

The -c switch in the rsync call is a peculiar case: It makes the program compare the files not only by their size but also calculates a checksum in the form of a hash value (see the "Hash Values" box). In doing so, you can also trace the modifications made to contents that do not change the size and where the timestamp has been set back to the original date afterwards.

Hash Values

Hash functions belong to the cryptographic methods. They can be used to calculate checksums. With Linux, you can use the tools md5sum (MD5 with 128 bits), sha1sum (SHA1 with 160 bits), sha224sum (SHA2 with 224 bits), sha256sum (SHA2 wth 256 bits), sha384sum (SHA2 with 384 bits), and sha512sum (SHA2 with 512 bits). The numerical sequence usually describes the length of the resulting hash value in bits whereby MD5 and SHA1 mark an exception. If your system isn't equipped with any of the listed applications, you can use opensslc , which also calculates hash values.

Content Parity

In order to check quickly whether two files have the same content, the Linux tools cmp , comm , diff , and sdiff can only partly help. They work line-by-line, byte-by-byte, or block-by-block and are excruciatingly slow in some cases. Instead, the shell script from Listing 5 uses the SHA256 operation on lines 3 and 4 – MD5 and SHA1 aren't considered to be safe anymore.

Listing 5

Comparing with SHA256

#! /bin/bash
# Create hash values
hashValue1=$(sha256sum $1 | awk ,{ print $1 }')
hashValue2=$(sha256sum $2 | awk ,{ print $1 }')
# Compare hash values
if [ $(echo -e "$hashValue1\n$hashValue2" | uniq | wc -l) == 1 ]; then
  echo "$1 and $2 are identical."
  exit 0
fi
echo "$1 and $2 are not identical."
exit 1

The more compact Listing 6 solves the problem with less computational cost but requires a deeper understanding of shell programming. You execute it with two files as parameters. Following usual Unix practices, the return value 0 in line 3 is for parity, and the value 1 in line 5 for disparity.

Listing 6

Comparing with SHA256 (II)

#! /bin/bash
if [ "$(sha256sum $1 | awk ,{ print $1 }')" == "$(sha256sum $2 | awk ,{ print $1 }')" ]; then
 echo "$1 and $2 are identical."; exit 0
fi
 echo "$1 are $2 are not identical."
exit 1

From Rsync to HIDS

To audit modifications between directories with large amounts of data, rsync is not advisable. On the one hand, it is too computationally intensive, on the other hand it involves the danger of incompleteness. That is why clever developers created a combination of tools to further automate this step. Such tools basically exist for local break-in detection on a system and are called host-based intrusion detection systems (HIDS) [6].

An HIDS toolset offers a wide range of functionalities. It is equipped with routines to detect file modifications, find rootkits, detect suspicious network packages and interfaces, as well as "mysterious" processes. Some of the tools can only be used on local systems, others on both local and remote systems. Table 2 gives a rough overview of the tools and their features.

Table 2

HIDS

Tool File Modifications Rootkits Network Processes Remote
dpkg 4 6 6 6 6
rpm 4 6 6 6 6
integrit 4 6 6 6 6
tripwire 4 6 6 6 6
tiger 4 4 4 4 6
rkhunter 4 4 6 6 6
samhain 4 6 6 6 4
debsums 4 6 6 6 6
chkrootkit 6 4 6 4 6
aide 4 4 6 6 6
fcheck 4 6 6 6 6
stealth (1) 4 4 6 6 6
ossec 4 4 4 6 4
unhide 6 6 6 4 6
suricata 6 6 4 6 6
inotify 4 6 6 6 6
(1) SSH-Based Trust Enforcement Acquired Through a Locally Trusted Host

Note that only inotify issues output in the very same moment it detects a modification. All the other applications do that later. dpkg , dlocate , and debsums are package management tools that only exist on Debian and its derivatives. In the narrower sense, they do not match the definition of HIDS and only check whether the installed files from a package are still unmodified (Listing 7). The same goes for rpm from Fedora and OpenSuse.

Listing 7

Using Dpkg

$ dpkg -V openssh-server
??5??????      /usr/lib/tmpfiles.d/sshd.conf

Modifications

For a directory-based analysis of modification in a filesystem, I have integrit , tripwire , samhain , and aide .

An initial inventory serves as a digital tell-tale for differences that are observed afterwards. In the first step, the program analyzes the directory you specified beforehand and generates a kind of snapshot of the current status. For every entry in the directory, it creates an entry in its internal database and remembers, for example, the file name, the date of creation and last modification, the access and user permissions, and the content.

However, the latter isn't being saved as a complete copy but only with a calculated hash value of the content. It is highly unlikely that this value is not unique, and it is generated reasonably fast and without too much computational effort. For the hash calculation, MD5 or a variant of the Secure Hash Algorhythm (SHA) is used in many cases.

The second step is the comparison of the current directory state with the one of the snapshot from the first step. The system registers all modifications between the two states and communicates them to you. This can happen via the standard output but also via email, Jabber (XMPP), or as an entry in a logfile.

If you receive such a warning, you should react adequately immediately, for example, by recovering to an earlier backup. Both steps create considerable I/O load on the storage medium – so you better not do this during high-load phases.

Integrit

You can get integrit either via the system's package management (e.g., with Debian, Ubuntu, Linux Mint, and Raspbian) or directly from the project's website [7]. After the installation, integrit doesn't constantly run as a daemon in the background, but you execute it explicitly for an analysis with admin rights.

To make the program know where to look, you specify a concrete job in a configuration file. You can freely choose the file name – in my example, I used integrit.conf (Listing 8). Your best bet is to save the file in the directory /root/integrit/ or /etc/integrit/ and set adequate access permissions to hinder curious users from fiddling with it.

Listing 8

integrit.conf

known=/root/integrit/known.cdb
current=/root/integrit/current.cdb
root=/data/original

Integrit requires three lines in a configuration file: the path to the inventory database, the path to the database of the current state, and the directory that shall be monitored (line 3). The same as goes for the paths and names of the databases – the definition of name and path are up to you and can be set according to local circumstances.

In the next step, you initialize the inventory database. For this, you execute integrit with the three switches -u for update, -v for verbose, and -C <configuration> (see Listing 9).

Listing 9

Executing integrit

# integrit -uv -C integrit.conf
integrit: ---- integrit, version 4.1 -----------------
integrit:            output : human-readable
integrit:         conf file : integrit.conf
integrit:          known db : /root/integrit/known.cdb
integrit:        current db : /root/integrit/current.cdb
integrit:              root : /data/original
integrit:          do check : no
integrit:         do update : yes
integrit: current-state db RMD160 --------------
integrit: a6fb12c69b773038f03987b7130ae07c0846fe02  /root/integrit/current.cdb

As you defined in the config file, integrit "remembers" the current state in the file current.cdb . Now you can copy this state with

cp current.cdb known.cdb

to the inventory in the file known.cdb . This file can now be used as reference by integrit to detect modifications.

Make some changes in the monitored directory. In the following example, I amended content and extended user permissions. In the next step, I let integrit compare the state of the directory with the previously registered inventory. For that, you execute it with the switch -c for changes, and with -C followed by the name of the configuration file (Listing 10).

Listing 10

Integrit Looking for Changes

# integrit -c -C integrit.conf
integrit: ---- integrit, version 4.1 -----------------
integrit:            output : human-readable
integrit:         conf file : integrit.conf
integrit:          known db : /root/integrit/known.cdb
integrit:        current db : /root/integrit/current.cdb
integrit:              root : /data/original
integrit:          do check : yes
integrit:         do update : no
changed: /data/original/file2   s(9c1185a5c5e9fc54612808977ee8f548b2258d31:b17ae69f081657c9f0b5e810affbce44b1f7593f)
changed: /data/original/file2   p(0644:0654) m(20160602-175113:20160606-194552) c(20160602-175113:20160606-194610)
integrit: not doing update, so no check for missing files

The result of the comparison shows multiple modifications. In the last block of line 10, the s (for size ) marks a change in size in the file /data/original/file2 , which is also highlighted by the different hash values in brackets next to it. Furthermore you can find a p (for permissions ) in the last block of line 11 because I changed the user permissions from 0644 to 0654. The letter m (for modification date ) marks the date of modification, in this case, June 6, 2016 at 19:45.

In the last file, integrit also notifies you that it doesn't update the inventory database in the same step but only checks it for modifications. If you also want to adopt all changes to the inventory, use the -u (update) switch when executing.

Load Comparison

Each tool generates a certain load to fulfill its tasks. On systems in production, the integrity check must by no means overload the system. That is why I tested rsync and integrit under normal everyday conditions by filing and analyzing 16GB of saved data on an internal SSD.

In the course of the test, I saw that the parity test (as described in Listing 3) took an average of between one and two seconds. If rsync is supposed to calculate the checksum too (see Listing 4), then around one minute goes by. If you use other storage devices – e.g., connected via USB or network – you have to schedule more time. The same goes for SATA hard-disk drives, SD cards, and CDs/DVDs.

Integrit on the other hand requires around 150 seconds for the database inventory (see Listing 9) and writes 24MB of data during this step. As shown in Listing 10, the check for modifications on average lasts about the same time. For the additional update of the database, you'll have to schedule 5 to 10 percent more time.

Similar Tools

Filesystems like ZFS [8] and Btrfs [9] contain automatic integrity checks. If the system reads content from a storage medium, it automatically calculates a checksum for the content of the data block.

ZFS compares these with the checksum of the identical data block on a (mirrored) clone. If the two checksum are not equal, ZFS assumes the original data block's content to be damaged and throws a read error. If the system doesn't repair the problem automatically , you can decide to do so manually and have it replace the damaged data block with the content of the cloned filesystem [10].

Btrfs calculates a checksum (CRC32) for each block for a periodic redundancy check. Thus, the system detects bit errors and fixes them automatically in combination with a RAID in case the mirror is intact. If you use ext3 or ext4 as filesystems, the Smartmontools [11] and the application badblocks (from the e2fsprogs package) can help.

Dangers exist not only on filesystems, but also in processes and data streams. For the latter, you can use unhide [12] and suricata [13]. You can use either to monitor processes and network packages in search of malicious behavior. Unhide monitors the running processes and tries to find those wanting to hide from the ps command's output. In order to do so, it – among other things – compares entries in the /proc filesystem with running processes.

Outlook

No one of the tools presented here is able to prevent modifications in the filesystem, but they do help you to detect such changes when they happen. With this information, you have the opportunity to take steps against suspicious modifications and set your system back to the correct state again. If you include the programs as daemons in the background or as cron jobs, you may be able to save a lot of troublesome manual work.

However, learning how to interpret all the warnings may require a bit of skill. It is also common that some systems report "false positives." A typical case is the surveillance of the directories /bin/ and /usr/bin/ . If you install new or update already existing software, it will alter its contents – that's what HIDS will notice and warn you about. So you will still have to take a careful look at the reports. l

Tip

In the BSD variants and Mac, the counterpart of chattr is called chflags . In Solaris, chmod covers this function; the same goes for lsattr , which is covered by an extension of the ls command.