Translate with OCR from screenshots of English games in C #

4 minute read


This article is intended to keep a reminder of the code of the main technical part in an attempt to somehow make Arena, the first in the game The Elder Scrolls series, playable in Japanese.
Details such as the history are described in the article on the Hatena blog.
The Elder Scrolls: Japanese localization of Arena ―― 1 –Programming and video editing memorandum

The environment and the form of the tool to be created are as follows.

Windows 7
Microsoft Visual Studio Community 2019
Version 16.6.3

Windows Forms Application (.NET Framework)
.NET Framework 4.7.2

In addition, the translation accuracy at the present time is also described in the next article on the Hatena blog.
The Elder Scrolls: Japanese localization of Arena –2 –Programming and video editing memorandum

Main technologies used

OCR processing

Tesseract is used for the part that performs OCR from the screenshot of the game.
The reason is that I have investigated it before.
[Updated from time to time] List of articles referenced in Python, OpenCV, Tesseract –Qiita

Translation process

To translate the character string recognized by OCR, use Selenium and use the method of Google Translate.
In addition, I have done the same thing before, and in that case I used the standard WebBrowser control, but since script errors etc. occur and it does not work, I decided to use Selenium which can perform more action operations.

Also, if you just use Google Translate, the following article is a great way to make your own API using GAS.
How to make Google Translate API for free –Qiita

However, this time I will release the tool once I can use it, so I will take the method of operating the browser and executing Google Translate to avoid the method of using my own API.

Image correction etc.

To improve the recognition accuracy of OCR, use OpenCvSharp to perform image correction.
I have almost forgotten how to use OpenCv, but I chose it because it would be OpenCv for image manipulation.

Search for Tesseract from the NuGet package manager.
Install it because it says Tesseract. The version is 3.3.0.
In the sample code, the training data created by jTessBoxEditor is used, so if there is no data, get the data from the following location. Place it in an appropriate place and specify the path.
GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

Only the necessary parts are excerpted.

using Tesseract;

var tesseract = new TesseractEngine(@"C:\tesseract\training\tessdata", "eng01");
var image = new Bitmap(@"C:\hoge.png ");
var page = tesseract.Process(image);
var text = page.GetText();

Search for Selenium from the NuGet package manager.
Install Selenium.WebDriver. The version is 3.141.0.
In addition, we need a separate browser driver, so we’ll use Chrome here.
Install Selenium.Chrome.WebDriver. The version is 83.0.0.

Google Translate on Chrome

Using Selenium’s Chrome driver, I created a class that passes a string and returns the translation result as follows.
The operation of the Google Translate part is at the level that it worked with this for the time being. I will look for a better way to write it in the future.
As a caveat, if you don’t call Quit, the hidden browsers and consoles will remain, so if you forcibly terminate them, especially when you start debug, they will be left behind.

using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
using System;

namespace RTranslation
    public class RSeleniumChrome
        IWebDriver driver;
        WebDriverWait wait;
        public string TranslationString;

        public RSeleniumChrome()
            var driverService = ChromeDriverService.CreateDefaultService();
            //Hide command prompt
            driverService.HideCommandPromptWindow = true;

            //Headless (do not show browser)
            var options = new ChromeOptions();

            driver = new ChromeDriver(driverService, options);
            wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));


        public void Close()

        public string Translation(string src)
                wait.Until(condition => condition.FindElement(By.Id("source")).Text == "");
                if (wait.Until(condition =>
                        TranslationString = condition.FindElement(By.ClassName("tlid-translation")).Text;
                        return true;
                    catch (Exception ex)
                    return false;
                    return TranslationString;
            catch (Exception ex)
            return String.Empty;


To be honest, I’m not sure which one to choose, but I did the following.
Search for ʻOpenCvSharp from the NuGet package manager. ʻInstall OpenCvSharp4.Windows. The version is
The project URL is as follows.


private Bitmap Threshold(string filename)
    var src = Cv2.ImRead(filename, ImreadModes.Grayscale);
    var dst = new Mat();
    Cv2.Threshold(src, dst, 100, 255, ThresholdTypes.Binary);
    return dst.ToBitmap();

Clipboard operation

When you are actually playing the game, take a screenshot of the game screen with [Alt] + [Print Screen].
Therefore, the clipboard is monitored so that images can be acquired.

Monitor the clipboard and get the image

public partial class Form1 : Form
    [DllImport("user32.dll", SetLastError = true)]
    private extern static void AddClipboardFormatListener(IntPtr hwnd);

    [DllImport("user32.dll", SetLastError = true)]
    private extern static void RemoveClipboardFormatListener(IntPtr hwnd);

    private void Form1_Load(object sender, EventArgs e)

    private void Form1_FormClosed(object sender, FormClosedEventArgs e)

    protected override void WndProc(ref Message m)
        if (m.Msg == 0x31D)
            base.WndProc(ref m);

    private void GetImage()
        if (!Clipboard.ContainsImage())

        var img = Clipboard.GetImage() as Bitmap;

Referenced articles, etc.

-I want to hide the command prompt screen displayed by Selenium + ChromeDriver –Google Groups

-Clipboard monitoring with C # (Windows API) –Qiita