A
FLAC Fingerprint is a small text file (
ffp.txt) that contains the filename and the checksum information for one or more
.flac files. The fingerprint is analogous to yet somewhat different from the
.md5 files for used for
Shorten (
.shn):
- Unlike an .md5 file, the FLAC Fingerprint ffp.txt file itself is not actually used in performing integrity checks on .flac files. Instead, when you decompress or use flac's Test feature, FLAC automatically verifies each file against an internal checksum stored in the file.
- The ffp.txt information is used to visually compare different .flac filesets for lineage purposes. Reference fingerprints can be listed in the ShnDatabase. In a similar way, the ShnDatabase lists .md5s as "fingerprints" for .shn filesets. In both cases, the LiveMusicArchive also uses the fingerprints stored in the ShnDatabase to speed up their archiving of the music. So, it makes good sense to include a FLAC fingerprint file with each .flac seed.
- A FLAC Fingerprint is generated only for the audio data portion of the file. (Therefore, changing the filename or the tags or FlacMetadata does not change the fingerprint calculation.) In contrast, an .md5 is generated against the whole file, including header portions.
- To create an ffp file using the standard .flac command line tools type "metaflac --show-md5sum flac_file_names > ffp.txt" When you generate a flac fingerprint file, it is merely a readout and compilation of the internally stored checksums from each of the flac files.
SPECIAL NOTE ABOUT .MD5 FILES AND FLAC: Whether or not to make an
.md5 checksum file for a
.flac fileset is a confusing topic!
- Why whole-file flac .md5s can be a hindrance: Under FLAC, you are allowed to change the compression ratio and add/remove meta data to .flac files without changing the actual audio. The audio may be identical, but the extra data will completely change the .md5 checksum. Checking these .md5s against the new .flac files will report failure, even though there is nothing actually wrong with the new fileset. That can cause major confusion. Going by whole-file .md5 alone can also cause confusion when trying to compare the new fileset against others in a database (similar to the current situation with nonseeking vs. seek-appended .shn files).
- Why whole-file wav md5s can be a hindrance: wav files aren't perfectly standardized. Different applications can create different wav files with the same music data. Further, flac doesn't encode everything in the headers of the wav file, only what is necessary. So the file sets that are created using a wav > flac > wav conversion have a good chance of not having identical md5s.
- Why whole-file .md5 can be a help: They can still serve as a quick "parts list" since they will have a line for each flac file that is supposed to be in the fileset. If any files are missing, you can tell quickly. They help as a quick check for integrity of the whole file (not just the data part), so you can spot simple corruptions during uploads/downloads. It's also currently a good idea to add whole-file .md5 files to .flac sets that are uploaded to archive.org; it makes the whole upload/contribution process there run smoothly at several steps.
- A whole-file .md5 is really no longer necessary, since md5check.exe checks the .ffp file and will notify the user if any files are missing from the fileset. Ideally, this functionality will be incorporated into FlacFrontend.
- Because of these competing rationales, the community is still [struggling] to reach a consensus on generating whole-file .md5s. Currently, etree.org formally discourages the practice for trading, while the [Internet Archive's Live Music Archive] encourages it for their site.
Note that a flac fingerprint isn't a checksum of the encoded flac data - it's a checksum of the decoded music data. So to test the file, flac decodes the data in the file and verifies that the checksum of the music data matches the (internally stored) flac fingerprint.
This has a few interesting implications:
- The flac fingerprint should be identical to what is sometimes called a "shntool md5," which is a checksum calculated on just the music data.
- Since that checksum on just the music data can be verified regardless of the format the music is encoded into - shn, flac, or something else - it can serve as the one number needed to track a song in a file set. Beauty, eh? See the discussion under "1c. md5 mode" at the [shntool tutorial] page for another explanation of this.
- When flac decodes, it checks the file (and each part of the file) against the internal checksum data. If a flac file decodes without error, it's a good file - as long as you are using an application that reports decoding errors!
See also: FLAC, FlacFrontend, FlacFaq, SeedingGuidelines