The Four Ways To Capture A Web Page
For last several years, I have learned a lot about Open Source Intelligence (OSINT) and wanted to share with others my knowledge about this field. My background is in Computer Science and have 20 years of experience at this point. In a series of Blog posts, I will go deeper into each of these capture methods, but for now we will keep it simple. Our example web page is https://osintliar.com. It doesn’t have dynamic content, nor any videos.
In future blog posts, we will go over the problem set involved in live data captures.
Screen Capture
Taking a screenshot of a web page is a quick and straightforward method. It captures the page exactly as it appears at a moment in time, including layout and images. However, it’s static and doesn’t preserve the interactivity or underlying code. Screenshots can be saved in various formats like PNG or JPEG.
Saving As HTML
This method saves the HTML file of the web page. HTML (Hypertext Markup Language) is the standard markup language used to create web pages. When you save a page as HTML, it typically involves saving the basic HTML file along with a folder containing the associated files like images, stylesheets (CSS), and JavaScript files.
Saving As MHTML
MHTML (MIME Encapsulation of Aggregate HTML Documents) is a web page archive format used to combine resources like images, JavaScript, CSS, and HTML into a single file. When you save a page as MHTML, it creates a single file that encapsulates the entire page.
PDF Export
Exporting a web page as a PDF is a useful way to preserve its visual layout and text content. This method is widely supported and convenient for sharing and viewing. However, like screenshots, it creates a static record and doesn’t preserve the interactivity or the full functionality of the web page.
What Works Best?
My preference is for storing web pages as MHTML. They provide the underlying html source code, they do not have Javascript enabled in them. OSINT LIAR stores your web page captures as MHTML on your local computer, in an encrypted database, not in the cloud.
Did you know, OSINT LIAR can store all of these web page captures? If you have a favorite tool for doing your captures, awesome! OSINT LIAR is already compatible with it.