How DNS Prefetching and Preloading Can Lead to Incorrect Conclusions

Digital Evidence Pitfalls: The Impact of DNS Prefetching and Preloading

Dec 02, 2024

Intended Audience: Advanced-level digital forensic practitioners.

Imagine being wrongfully accused of visiting illegal websites, with digital evidence seemingly supporting the claim. What if those artifacts weren’t created by you but by your browser’s unseen background processes? Welcome to the world of DNS prefetching and preloading, where forensic misinterpretation can lead to life-altering consequences.

Overview

Coding used by web developers to improve the user experience (UX) of web browsing can cause data to be stored within a user’s device without the knowledge or interaction of the user. An untrained digital forensic analyst or a person reviewing the results of a forensic analysis that lacks proper context may make incorrect assumptions about a user’s activities.

Highlights

Domain Name System (DNS) serves as the phonebook for the Internet, translating the Internet Protocol (IP) addresses and domain names of Internet resources.
DNS prefetch is a tool used by web developers to improve the UX while browsing a website.
DNS preloading is another tool used by web developers to anticipate resources a user will need and download those to the user’s system before actually being requested in order to speed the browsing experience.
DNS prefetching and DNS preloading can create Internet artifacts on a user’s system that were not searched for, requested, or knowingly placed on that system by the user.
An untrained person may misinterpret DNS prefetching and preloading as user-initiated activity and make incorrect assumptions.

Recommendations

Digital forensic analysts and those reviewing digital forensic reports should:

Scrutinize Internet artifacts before reaching any charging, disciplinary, or finding of fault decisions.
Understand the difference between cache, cookies, searches, typed uniform resource locator (URL)s, and other forms of Internet evidence.
If reporting on Internet history for non-technical audiences, contextualize the forensic findings instead of simply providing a data dump of information and leaving the analysis to untrained individuals.
Be cautious of relying strictly on Internet artifacts such as the presence of cache, DNS entries, or cookies for decision making without other corroborating evidence.

Technical Details

DNS prefetching causes browsers to resolve IP addresses before a user requests the information and DNS preloading causes a browser to connect to Internet resources and download information without any knowledge or interaction of a user. The prefetching and preloading creates entries into a system that can be mistaken for user-initiated web activity and lead to incorrect conclusions during a digital forensic examination. This post describes how this can happen and the technology behind DNS prefetching and preloading.

Imagine this scenario: A law enforcement forensic analysis revealed multiple Internet browsing artifacts to websites that appeared to be related to illegal activity and these artifacts were used in part to support a criminal indictment. The defendant adamantly denied visiting any websites with names the same or even similar to what were highlighted in the law enforcement report.

Law enforcement had essentially created a "data dump" of browsing artifacts and provided that to prosecutors with no contextualization of the data, leading an untrained prosecutor to the conclusion that the defendant was involved in criminal activity.

The criminal defense team hired a forensic analyst who examined the computer and found some Internet activity that matched the law enforcement report, but did not see any other artifacts that would normally be found such as search terms, downloads, visited pages, typed URLs, and others to support the prosecutor’s theory.

This case highlights a challenge with forensic analysts performing what is often called “triage forensics”, which essentially means that a cursory exam is done on the digital device with the intent to locate enough evidence to support the allegation(s). Sometimes a full forensic analysis (sometimes referred to as “trial forensics”) isn’t completed until a defendant disputes the allegations and by then, the wheels of the justice system are already well in motion against a defendant and often times at immense reputational and financial expense.

There are undoubtedly issues with what has been described so far, but for this post I’m going to focus on the digital evidence. I should also say that while I’ve highlights some shortcomings of a law enforcement process, our intent is not to insinuate that this represents all law enforcement analysts because we know many who are exceptional. This case scenario is an actual situation that Natsar provided some assistance with.

There was no question that the forensic artifacts identified by law enforcement existed, but the problem was the lack of context or true analysis to explain why the artifacts existed. After some testing and further evaluation, the cause of these artifacts were determined to be from DNS prefetching and preloading.

To demonstrate how this technology works, we’ve created some videos and screenshots. Google Chrome was used as the browser for this example, however all browsers tested work the same for this particular artifact.

We began with a browser session that had no Internet history associated with it and validated that no artifacts existed. The below screenshot shows that no cookies were present in Chrome. The same was done to ensure there were no downloads, browser history, or other artifacts from any previous sessions.

Next Chrome was launched and navigated to the website www.msn.com. Using Google Chrome’s developer tools, all of the content was captured that is loaded in order to present the website to the user. This includes images, JavaScript, CSS, and other resources. This is all done without user interaction except for the navigation to the single URL of msn.com.

In the video below, you will see the resources being loaded as the page loads. This process is transparent to the user (unless using a tool like Developer Tools). Just to load www.msn.com there were over 400 requests for resources.

Side note – websites often use content deliver networks (CDN)s to increase loading speed. In a very basic explanation, CDNs distribute commonly requested assets for websites across geographically dispersed servers. For example, a website may have static content like JavaScript, images, and CSS files hosted in Amazon Web Services (AWS) or use a CDN provider like CloudFlare to offload the work of a web server and have faster load times of the content.

In the next video, we drill down into some of the content that was downloaded to my computer when pulling up www.msn.com. You will see the JavaScript, CSS, and image files as we click through them. All of the images that are clicked on and shown in preview mode would also be downloaded to the computer’s hard drive (private browsing can affect this, but for purposes of this blog, private browsing was not used). On the left side of the page, you will see sources of content such as bing.com, cdn.taboola.com, and others.

The below screenshot shows the DNS prefetch that occurs with this site. Similar to the concept of Windows Prefetch in the Microsoft Operating System (OS), DNS prefetch tells a website to go get information and make connections to other web resources early on during the page loading process to speed things up. A practical example of this is when a web developer places a simple contact form at the bottom of a webpage. Part of the contact form might be Google Captcha, used to reduce the likelihood of spam submissions to the form. Instead of waiting until a user scrolls to the bottom of the page to load the Google Captcha JavaScript, the web developer does a prefetch at the top of the page, already loading that content so when the user gets to the bottom, there is no delay. Imperva has a nice writeup on DNS prefetching here.

Prefetching can be done for anything and it is simply a line of code entered into the site. A screenshot below shows the DNS prefetching done on msn.com. A developer could hard code any prefetch they wanted into a website and cause a browser that is navigating the site to reach out and translate the domain names listed in the coding.

Now just imagine if a website you visited was coded to prefetch malicious or criminal domain names. These prefetches would be done without your knowledge and would leave artifacts behind on your computer that an untrained forensic analyst (or one that didn’t take the time to do a true forensic examination) could draw some incorrect conclusions.

Going back to the Google Chrome history and artifacts on the test system, below is a screenshot of the same view shown earlier of the cookies but after navigating only to msn.com:

After just going to the single website www.msn.com on my system, you can see there are 169 cookies present on the hard drive. From the screenshot above, you see multiple domains that were never intentionally or knowingly visited – but the test computer did automatically because of how the website was coded.

The date/time stamps shown above are in WebKit format, so a simple conversion will show them in UTC or local time.

Also now present on the test computer are additional files from some of these websites, such as Facebook, Twitter, Google, etc. Remember, these sites were never intentionally navigated to.

Although all of these files are now present on the computer, by looking at Google Chrome’s history from the application itself, it still only shows msn.com was visited:

Using the forensic tool Hindsight, over 620 entries are made in the software for a single visit to www.msn.com. The entries include cookies, cache, preferences, and URLs.

By looking at the artifacts in another forensic tool, similar results are found. The screenshot below shows Autopsy’s analysis of the Chrome history. According to Autopsy, there were 155 items of Internet cache, 197 cookies downloaded just from visiting msn.com, and then the single item of web history.

The same test was done with Wireshark running to capture the network traffic from the test workstation. As expected, Wireshark showed the same as Chrome developer tools, with all of the DNS queries and responses being shown. Below is a screenshot from Wireshark showing some of the queries:

An untrained incident responder or forensic analyst looking at network logs may also come to an incorrect conclusion that a user searched for or navigated to these websites because of the DNS queries present on the network.

Conclusion

Based on the testing and analysis, we were able to show that the websites in question were not visited by the defendant, nor did the defendant search for those websites.

Performing a digital forensic analysis is much more than simply pressing the find evidence button and then handing over a few hundred pages of results to someone. Forensics should include a thorough analysis of the digital evidence by a trained analyst along with proper contextualized results and explanations to stakeholders.

Digital forensic analysts should look at the totality of the circumstances with a device including Internet cache and cookies, but also typed URLs, viewed pages, timelines of activity, downloads, and the user’s normal pattern of behavior among other things when performing their analysis. They should also act as subject matter expert consultants to those consuming their forensic reports and provide the necessary explanations, context, and opinions when necessary.

Hook

Bottom Line

DNS prefetching and preloading, tools designed to enhance user experience, can create misleading digital artifacts on a user’s device. These artifacts, such as DNS entries, cookies, and cached files, may appear to indicate user-initiated activity but are often automated processes by browsers. Untrained forensic analysts or incomplete examinations can lead to incorrect assumptions, as shown in real-world cases where defendants were wrongfully implicated. Proper digital forensic analysis must go beyond artifact identification to include context, corroboration, and a comprehensive understanding of web technologies. By doing so, analysts ensure accurate findings and protect individuals from unjust outcomes.