Find binary-like files with ssdeep

Slashdot it! Delicious Share on Facebook Tweet! Digg!
Paul Cowan, 123RF.com

Paul Cowan, 123RF.com

Pretty similar

Normal checksums are used to identify "identical" files. The ssdeep tool is used for finding files that are similar in terms of content.

Checksums are a fine thing: Once created and saved, they make it possible to quickly detect the slightest changes to "hashed" files. They are often used with system checks and when installing new packages. The contents of a file to be checked are examined for correctness by comparing the checksum currently calculated from the file with the saved version.

However, in everyday life, there are now several cases where it isn't a matter of precise equality, but rather of similarity. Think, for example, about the different versions of a document, an image, source code, a compiled program, etc. In these cases, most parts of the files will be identical, and differences will only exist in a few places.

Here, similarity is the measure of the changes made to the files. Different versions of a file often differ only by a few bytes. With plain text documents, you could still find this out to a certain extent using household remedies such as wc , uniq , sort , and tr by separating the text into the words contained (tr ',.: ' '\n' ) and then sorting them (| sort| ) to display the frequencies of individual words using uniq . This works, for example, as follows:

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Pages: 3

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content