Author: Dan Cardin
OSINT LIAR’s Package Management
Triaging OSINT Pivots with OSINT LIAR
OSINT LIAR 1.5.0 Release
Who Am I: Release 0.7.0
Application Key Token
Browser Extensions And User Snippets
The Four Ways To Capture A Web Page
Optical Character Recognition (OCR) with LIA
OCR is the process of converting either printed documents or images with words into digital text that can be used for analysis. This is helpful for transforming PDFs that contain images with words into something that can be indexed and made searchable within LIA. Alternatively, OCR is also known as text recognition.
This functionality can be used to translate foreign digital documents into one’s native language.
OCR functionality is not natively built into LIA, but in this article, I will demonstrate how LIA can readily be configured to interact with an online OCR service.
Step 1. Select an image
Let’s grab a meme off the internet using LIA.
Step 2. Push the image to the OCR API
By clicking on the drop down menu in the dashboard of LIA you can select the option to push content to the OCR provider. After clicking the “OCR” option the API immediately responds with the following data.
If the “AutoCapture” feature is enabled this content will be saved into LIA for you.
How to configure
{"ParsedResults":[{"TextOverlay":{"Lines":[{"LineText":"$ I DON'T THINK THAT MEMES","Words":[{"WordText":"$","Left":0.0,"Top":24.0,"Height":23.0,"Width":13.0},{"WordText":"I","Left":24.0,"Top":28.0,"Height":49.0,"Width":13.0},{"WordText":"DON'T","Left":54.0,"Top":27.0,"Height":51.0,"Width":141.0},{"WordText":"THINK","Left":206.0,"Top":28.0,"Height":50.0,"Width":150.0},{"WordText":"THAT","Left":367.0,"Top":28.0,"Height":50.0,"Width":125.0},{"WordText":"MEMES","Left":505.0,"Top":27.0,"Height":51.0,"Width":172.0}],"MaxHeight":51.0,"MinTop":24.0},{"LineText":"WHAT YOU THINK IT MEMES","Words":[{"WordText":"WHAT","Left":19.0,"Top":395.0,"Height":50.0,"Width":147.0},{"WordText":"YOU","Left":177.0,"Top":394.0,"Height":52.0,"Width":96.0},{"WordText":"THINK","Left":287.0,"Top":395.0,"Height":50.0,"Width":150.0},{"WordText":"IT","Left":450.0,"Top":394.0,"Height":51.0,"Width":45.0},{"WordText":"MEMES","Left":508.0,"Top":394.0,"Height":51.0,"Width":172.0}],"MaxHeight":52.0,"MinTop":394.0}],"HasOverlay":true,"Message":"Total lines: 2"},"TextOrientation":"0","FileParseExitCode":1,"ParsedText":"$ I DON'T THINK THAT MEMES\r\nWHAT YOU THINK IT MEMES\r\n","ErrorMessage":"","ErrorDetails":""}],"OCRExitCode":1,"IsErroredOnProcessing":false,"ProcessingTimeInMilliseconds":"328","SearchablePDFURL":"Searchable PDF not generated as it was not requested."}
In just a couple more steps you can add this functionality to your LIA installation. First, sign up for a free account at http://ocr.space/OCRAPI. After getting your API key, you will need to add your key to the API Keys page.
After the API Key is edited you can add the functionality in via editing your JSON config file located at ~/AppData/Local/Lia/webroot/json-plugins.json , don’t worry, this part will get automated sooner than later.
Copy and Paste the json into the file,
{
"cmd": "PostForm",
"label": "OCR",
"url": "https://api.ocr.space/Parse/Image",
"method": "POST",
"mode": "cors",
"cache": "default",
"credentials": "omit",
"formatter": "manual",
"headers": {
"Content-Type": "application/x-www-form-urlencoded"
},
"redirect": "follow",
"referrerPolicy": "no-referrer-when-downgrade",
"contentTypes": [],
"fieldMappings": { "file": "ContentData", "apikey": "OCR_API_KEY", "language": "eng", }
}
Make sure the json is valid, before saving. If you need help, contact support@bakerstreet.llc.
This JSON code maps LIA’s internal API code to the external API made available through https://api.ocr.space/Parse/Image.
Boolean Search with LIA
The Local Internet Archive or LIA now supports Boolean search. Boolean search can help you narrow or widen your search criteria depending on your needs. LIA supports the following Boolean search options:
- NOT
- AND
- OR
These provide the ability to focus a search, particularly when your topic contains multiple search terms.
For example, let imagine we want to extract out search results based upon the state name. One query may look like this:
“MICHIGAN NOT FLORIDA”
This gives all the results that contain the keyword “MICHIGAN” but does not contain the word “FLORIDA”. This is helpful for narrowing the results of a search space.
If we needed to widen our scope and include all results that have either “MICHIGAN” or “FLORIDA” we would use the following query.
“MICHIGAN NOT FLORIDA”
In our last example, we only want to get results that have “MICHIGAN” and “FLORIDA”. We would use the next query.
“MICHIGAN AND FLORIDA”
Using this search phrase will restrict the results to search hits that contain both “MICHIGAN” and “FLORIDA”.
Complex Boolean Searches
It is possible to chain Boolean searches together to expand or narrow the search space even more.
Here is a good example that uses 2 Boolean operators.
“MICHIGAN AND FLORIDA NOT OHIO”
These can be continuously chained to together to narrow or wide the scope of your search.