Posted on

Optical Character Recognition (OCR) with LIA

OCR is the process of converting either printed documents or images with words into digital text that can be used for analysis. This is helpful for transforming PDFs that contain images with words into something that can be indexed and made searchable within LIA. Alternatively, OCR is also known as text recognition.

This functionality can be used to translate foreign digital documents into one’s native language.

OCR functionality is not natively built into LIA, but in this article, I will demonstrate how LIA can readily be configured to interact with an online OCR service.

Step 1. Select an image

Let’s grab a meme off the internet using LIA.

Step 2. Push the image to the OCR API

By clicking on the drop down menu in the dashboard of LIA you can select the option to push content to the OCR provider. After clicking the “OCR” option the API immediately responds with the following data.

If the “AutoCapture” feature is enabled this content will be saved into LIA for you.

How to configure

{"ParsedResults":[{"TextOverlay":{"Lines":[{"LineText":"$ I DON'T THINK THAT MEMES","Words":[{"WordText":"$","Left":0.0,"Top":24.0,"Height":23.0,"Width":13.0},{"WordText":"I","Left":24.0,"Top":28.0,"Height":49.0,"Width":13.0},{"WordText":"DON'T","Left":54.0,"Top":27.0,"Height":51.0,"Width":141.0},{"WordText":"THINK","Left":206.0,"Top":28.0,"Height":50.0,"Width":150.0},{"WordText":"THAT","Left":367.0,"Top":28.0,"Height":50.0,"Width":125.0},{"WordText":"MEMES","Left":505.0,"Top":27.0,"Height":51.0,"Width":172.0}],"MaxHeight":51.0,"MinTop":24.0},{"LineText":"WHAT YOU THINK IT MEMES","Words":[{"WordText":"WHAT","Left":19.0,"Top":395.0,"Height":50.0,"Width":147.0},{"WordText":"YOU","Left":177.0,"Top":394.0,"Height":52.0,"Width":96.0},{"WordText":"THINK","Left":287.0,"Top":395.0,"Height":50.0,"Width":150.0},{"WordText":"IT","Left":450.0,"Top":394.0,"Height":51.0,"Width":45.0},{"WordText":"MEMES","Left":508.0,"Top":394.0,"Height":51.0,"Width":172.0}],"MaxHeight":52.0,"MinTop":394.0}],"HasOverlay":true,"Message":"Total lines: 2"},"TextOrientation":"0","FileParseExitCode":1,"ParsedText":"$ I DON'T THINK THAT MEMES\r\nWHAT YOU THINK IT MEMES\r\n","ErrorMessage":"","ErrorDetails":""}],"OCRExitCode":1,"IsErroredOnProcessing":false,"ProcessingTimeInMilliseconds":"328","SearchablePDFURL":"Searchable PDF not generated as it was not requested."}

In just a couple more steps you can add this functionality to your LIA installation. First, sign up for a free account at After getting your API key, you will need to add your key to the API Keys page.

LIA’s API Keys page

After the API Key is edited you can add the functionality in via editing your JSON config file located at ~/AppData/Local/Lia/webroot/json-plugins.json , don’t worry, this part will get automated sooner than later.

Copy and Paste the json into the file,

    "cmd": "PostForm",
    "label": "OCR",
    "url": "",
    "method": "POST",
    "mode": "cors",
    "cache": "default",
    "credentials": "omit",
    "formatter": "manual",
    "headers": {
      "Content-Type": "application/x-www-form-urlencoded"
    "redirect": "follow",
    "referrerPolicy": "no-referrer-when-downgrade",
    "contentTypes": [],
    "fieldMappings": { "file": "ContentData", "apikey": "OCR_API_KEY", "language": "eng", }

Make sure the json is valid, before saving. If you need help, contact
This JSON code maps LIA’s internal API code to the external API made available through

Posted on

Boolean Search with LIA

The Local Internet Archive or LIA now supports Boolean search. Boolean search can help you narrow or widen your search criteria depending on your needs. LIA supports the following Boolean search options:

  • NOT
  • AND
  • OR

These provide the ability to focus a search, particularly when your topic contains multiple search terms.

For example, let imagine we want to extract out search results based upon the state name. One query may look like this:


This gives all the results that contain the keyword “MICHIGAN” but does not contain the word “FLORIDA”. This is helpful for narrowing the results of a search space.

If we needed to widen our scope and include all results that have either “MICHIGAN” or “FLORIDA” we would use the following query.


In our last example, we only want to get results that have “MICHIGAN” and “FLORIDA”. We would use the next query.


Using this search phrase will restrict the results to search hits that contain both “MICHIGAN” and “FLORIDA”.

Complex Boolean Searches

It is possible to chain Boolean searches together to expand or narrow the search space even more.

Here is a good example that uses 2 Boolean operators.


These can be continuously chained to together to narrow or wide the scope of your search.