distributed processing

Introduction

In this post, I will provide some initial impressions and findings. I do not endeavor to write a white paper, or to employ an industry standard, scientific methodology to evaluating the tool (if for no other reason than because I am constrained by time).

PostgreSQL

First, I note that it appears that no one has been able to get FTK to work with PostgreSQL, leading me to conclude that the product was shipped without being tested in this regard. (If a reader has been able to get it working, I encourage you to post a comment here). I was not able to get it to work, and I wasted two valuable —otherwise billable— days I had set aside for a client, only to make this discovery.

My review of the AccessData forums indicates identical experiences, and I haven’t found one poster there who yet claims to have finished an evidence load using FTK with Postgres. (Note: I am unable to determine whether it is an AccessData use-violation to excerpt comments from the private forums, so I am proceeding with an abundance of caution by not doing so).

Likewise, I conferred on Friday with another colleague, a lead examiner for a large company, and he replied:

I just had the same experience. I mistakenly upgraded to 4.0, removed Oracle completely, and installed PostgreSQL. That was a mistake . . . some of my run-of-the-mill cases that should only take a couple hours were taking days and had to be killed off. Then, after I removed PostgreSQL and re-installed Oracle I couldn’t get it to forget about the old connection and had all sorts of weirdness with it not finding Oracle some of the time. I eventually backed out FTK, Oracle, and PostgreSQL and did a complete manual cleanup of all garbage files and registry entries and then re-installed everything. I am back to Oracle with 4.0 and things are fine again, but what a mess to deal with this on 3 machines.

And, the same experiences are found on the ForensicFocus forums (e.g., http://www.forensicfocus.com/Forums/viewtopic/p=6557533/). Thus, based solely on these numerous anecdotes, and based on my understanding that new purchasers of FTK do not receive Oracle licensing, I have concluded that FTK 4.0 & Postgres is not merchantable (suitable for its intended purpose), although –as noted above– I may be incorrect, and would be pleasantly surprised to be proved wrong.

So, I, too, reverted back to Oracle. Unfortunately, I couldn’t get the Oracle KFF library for v4.0 posted on AccessData’s FTP site to work. In browsing through the AccessData forums, neither could anyone else. AccessData made available to me and certain others who complained a working KFF, which –last time I checked– is not the one available for download at the AccessData FTP site.

Now, like many of the others who have posted to the AccessData forums and elsewhere, I am able to use v4.0 with Oracle.

Hardware

The three machines I used for testing are, as follows:

(1) FTK & Oracle server (one box) – SuperMicro X8DTL-6F motherboard, LSI SAS2 2008 controller, two RAID-0 volumes each consisting of two OCz Vertex 3 Max IOPS 120GB SSDs (SATA III – one volume for Oracle data; the other for O/S and the adTemp directory), 24GB of DDR3 1333MHz ECC non-registered server memory, and two Intel Xeon X5650 hexacore processors.

(2) Distributed Processing Engine (“DPE”) #1 – Asus M4A89GTD Pro/USB3 motherboard, AMD Phenom II X6 1100t hexacore processor (watercooled, but not overclocked), 16Gb of DDR3 memory, O/S residing on an OCz Vertex 3 SSD (SATA III), and the temp directory used by the AccessData distributed processing engine residing on a separate OCz Vertex 3 Max IOPS edition SSD, and the pagefile residing on a Western Digital Raptor 10K RPM hard drive.

(3) DPE #2 – Hewlett Packard DV6 laptop, Intel core i7 720qm processor, O/S installed on an Intel SATA II SSD, temp directory used by AccessData distributed processing service residing on a separate OCz Vertex 2 SSD (SATA II), 8GB of RAM

Source Evidence Configuration

In an effort to find the fastest evidence load times, I experimented with various combinations of the foregoing. As a test image, I used a 186GB DD image (ultimately consisting of 1,052,891 evidence items), hosted on a Western Digital 4TB My Book Studio Edition II (SATA II – up to 3 Gb/sec) configured as RAID-0. I used both KFF alert and ignore, MD5 & SHA1 hashes (but not SHA256 or “fuzzy” hashes), expand compound files, flag bad extensions, entropy test, dtSearch index, create graphics thumbnails, data carve, meta carve, registry reports, include deleted files, and explicit image detection (X-DFT & X-FST). Using oradjuster, I tweaked the SGA_TARGET parameter to use only 18% of available physical memory during evidence processing.

Distributed Processing

Before continuing, I’d like to mention a few things about the Distributed Processing Engine, which are hard-learned lessons from either failing to read the user guides and appendices, or from experimentation:

(1) To get DPE working, you must have a system share as the path to the evidence in the Add Evidence dialogue box. Without it, no distributed processing will occur. Likewise, you need to have the Oracle working directory on a public share, the FTK-cases directory on a public share, and all systems using mirrored accounts (which Microsoft defines as “a matching user name and a matching password on two [or more] computers”). You also need to disable any Windows firewalls or other firewalls. Tip ► An easy way to make certain the port on a DPE is reachable is to install PortQuery v. 2, and run the command, “PortQry -n {machineName} -p tcp -e 34097″ where “machineName” is the name of your DPE, and where 34097 is the default port (configurable in the distributed processing configuration menu). AccessData ought to include a “test connection” button in the distributed processing configuration — it would probably save their help desk a lot of e-mails and calls.

(2) And, although the v4.0 System Specification Guide discusses how to configure the adTemp directory on the localhost processing engine (which directory should be located on its own, high i/o throughput drive, because it is the interface between the processing engine and Oracle), I have found no discussion about how to optimize the DPE machines.

Tip ► On a Windows Vista or Windows 7 machine, if you are logged on as “farkwark,” the distributed processing engine will write its files to Users/farkwark/appData/Local/Temp. To relocate this /Temp directory to a different drive, you need to create a junction, as follows. First, log off and log on as a different admin account user. Next, move (not copy) the /Temp directory to the different drive (say F:). Rename it, if you like (e.g., “FTKtemp”). Now, from a command prompt, type:

mklink /j “Users\farkwark\appData\Local\Temp” “F:\FTKtemp”

From this point forward, the processing engine will, in fact, be writing its temporary files to the F:-drive, thereby not competing with the O/S drive for i/o.

Hardware Configuration Experiment No. 1: FTK & Oracle Server + DPE #1 & DPE #2

With this three-machine configuration, I rarely saw the FTK & Oracle server’s 12 cores (24, if one counts hyperthreading) get above a collective 15% of load. DPE #2 (the HP laptop) processing load reached near 100% several times, with up to 6 of 8GB available memory in use. DPE #1 reached between 50-80% CPU utilization with extended periods of low utilization, and about 8GB (of 16GB available physical memory). Total time was 11 hours, 24 minutes.

Hardware Configuration Experiment No. 2: FTK & Oracle server (one box implementation), alone

With this one-box configuration, the dual xeon hexacore CPUs were pegged for extended periods of time at 99 or 100% (note, this differs from the experience of others, who have written,”We have multi cored, multi processor CPUs on our systems. What we’ve found is that typically, unless we are password cracking, that the I/O from the disks can’t keep up with resources available. Meaning our CPUs are never maxed out. So the CPUs are not the bottleneck for getting more speed“). The FTK/Oracle server used up to 20GB (of 24GB available) of physical memory (recall that SGA_TARGET was set to 18%). Total time elapsed was 9 hours, 4 minutes, an improvement of 20.5% over using a three-machine distributed processing configuration (no, that’s not a typo).

Hardware Configuration Experiment No. 3: FTK & Oracle server + DPE #1

With this two-machine configuration, the FTK Server’s CPU utilization was rarely above 40%, only occasionally reached 60%, and most usually was between 5 to 25%, and used between 8 to 12GB (of 24GB available) of physical memory. Meanwhile, DPE #1′s CPU utilization was pegged at 99 to 100% for extended periods of time, and used up to 10GB (of 16GB available) of physical memory. Total time elapsed was 8 hours, 33 minutes, a 25% improvement over the 3-machine distributed processing configuration, and only a 5.7% improvement over using the FTK/Oracle one-box solution.

Hardware Configuration Experiments Conclusion

Based on the type of hardware I am using, I found very little benefit (up to 6% processing time improvement) and, in fact, some detriment (over 20% in processing time loss) in stringing together numerous DPE workstations. My experience is inconsistent with AccessData’s findings of processing time differences between stand-alone boxes and distributed processing clusters (see http://accessdata.com/distributed-processing).

Add-on Modules: Data Visualization & Explicit Image Detection

Initially, I thought the Data Visualization module did not work. No matter whether I attempted to view a directory containing several score of files, or the entire 1 million + items, it never displayed more than a handful of results (sometimes zero or one, resulting in a pie graph that was just one big green circle). Turns out (of course) that It was my fault for failing to “select” the appropriate date range. Had I read the manual first, I would have noticed, “Information can only be displayed for the date that you have selected.” FTK 4.0 User Guide at 200. Apparently, when the Data Visualization tool first opens, it defaults to one day (the first day of the oldest evidence in the list) –not very inuitive.

AccessData claims that the Data Visualization add-on component “provides a graphical interface to enhance understanding and analysis of cases. It lets you view data sets in nested dashboards that quickly communicate information about the selected data profile and its relationships.” Among other things, it purportedly provides “a complete picture of the data profile and makeup,” empowers the examiner to “Understand the file volume and counts through an interactive interface,” and “Create a treemap of the underlying directory structure of the target machine for an understanding of relative file size and location” (similar to, but not as elegant as WinDirStat).

In summation, the tool appears to work as designed, although I haven’t done any substantive reporting off of it. One user posted on the AccessData forum that there appears to be no way to export the graphs in to a report, but this can be easily remedied by taking a screen clipping using SnagIt, Microsoft’s OneNote, or a screen print.

Also, I have been experimenting with the EID. AccessData states, “This image detection technology not only recognizes flesh tones, but has been trained on a library of more than 30,000 images to enable auto-identification of potentially pornographic images . . . AccessData will continue to integrate more advanced image search and analysis functionality into FTK. Customers who have added the explicit ID option to their Forensic Toolkit® license and are current on their SMS will automatically receive those new capabilities as they become available.” Notwithstanding this commitment, it appears that no additional functionality has been added since its release with FTK 3.0. I also note that the technology is unlike Microsoft’s “PhotoDNA,” which is reputed to process images in less than five milliseconds each and accurately detected target images 98 percent of the time, while reporting a false alarm one in a billion times. Comparatively, AccessData’s EID has been found to achieve 69.25% effectiveness with 35.5% false positives. Marcial-Basilio, Aguilar-Torres, Sáchez-Pérez, Toscano-Medina, Pérez-Meana, “Detection of Pornographic Images, 2 Int’l Journal of Computers 5 (2011).

My experience reveals many false positives (such as, “small_swatch_beige.png,” an image consisting of a plain beige coloured box, ranking at the very top of the list compiled by the X-ZFN algorithm, which is supposed to be the most accurate of the three algorithms), and seems to confirm that the algorithm is based on the presence of flesh tones (and nothing more, unless your system has one or more of the 30,000 images that became part of the library at the time the tool was introduced). Nevertheless, if one is short on time (and many law enforcement agencies’ examiners are), the tool does certainly help to reduce the data set that requires manual review.

Posted: 18 Mar, 2012 | Comments Off
Categories: Legal Techology | Tags: AccessData FTK 4.0, AMD hexacore processor, distributed processing, FTK & Oracle, FTKtemp, HP DV6, Intel hexacore processors, Intel i7, Minnesota Computer Forensics and Legal Technology, OCZ Vertex 3 Max IOPS, PhotoDNA, PortQuery, Solid state hard drive, WinDirStat, www.forensicfocus.com | By: Sean Harrington.

Computer Forensics and Legal Technology Blog

distributed processing

AccessData FTK 4.0: initial impressions