Skip to main content

On Sale: GamesAssetsToolsTabletopComics
Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

fl0rm

7
Posts
2
Topics
1
Following
A member registered Oct 12, 2015

Recent community posts

(2 edits)

No need to worry about how long replies take. Really. (I didn't get around to testing last week due to other things coming up, too, though I *will* do it this week)


As for the ID thing, yeah, it's more like your first example, just getting an extra value that can be passed along with the request for anyone who has a weird setup like mine. I don't think the LLM should ever see it, and some APIs might break because it's an unknown field. I also don't care if it's stable between sessions: it can be an integer or a UUID or something user-definable; my use-case and design just wants a way to differentiate sources.


What I'm doing is intercepting every JSON post, adding them to a list, making a Chat Completion request based on that, and returning the result.


I have GameTranslate post something simple like this:

```

{

  "text": "<captured data>"

}

```


And then a small Python script is building the real request and sending back the response from a llama.cpp server:

```

{

  "model": "aya-expanse",

  "temperature": 0.65, //other parameters, too, but they're not important to illustrating the idea

  "messages: [

    {

      "role": "system",

      "content": [{

        "type": "text",

        "text": "<my-system-prompt, basically telling it that it's a casual translator that is willing to use slang and assume a conversational tone>"

      }]

    },

    {

      "role": "user",

      "content": [{

         "type": "text",

         "text": "<old captured data, held in memory, each as their own message, to build chat-history; a separate request to this proxy resizes the list's maximum capacity, letting me reset it on scene-changes or when more or less history is appropriate; another lets me drop recent messages, in case I don't want them influencing the next production>"

       }]

    },

    {

      "role": "user",

      "content": [{

         "type": "text",

         "text": "Translate the following into English without adding any commentary or explanations:\n\n<new captured data>"

       }]

    },

  ]

}

```

The response is intercepted and the newly captured data is appended to the "choice" block so I can see the translation and the original text when the overlay is opaque (I also sometimes add a kana line here). And then that slightly modified result is returned to GameTranslate to be presented.

Having the window-ID as an optional parameter that I can set in the initial request would let me easily direct the new text to an appropriate message-history list so that the main dialogue box doesn't get mixed up with a speaker-string. People who just pass things directly simply wouldn't set it, so there would be no impact on working setups.


I do think you could integrate something like what I'm doing into GameTranslate itself (with a bit less flexibility: no control over runtime buffer-tweaking because that would be very hard to present intuitively), but as illustrated by your use of Text Completion (which, yes, is more powerful), by now, people have surely created their own workflows around this feature. Just having REST-JSON opened up a ton of possibilities.

Well, I'll definitely use the Steam release as an opportunity to give you a bit more money for this system and add to its visibility through sales-figures.

It's been very useful as-is for my needs (particularly after you added the OpenAI-compatible LLM interface; that works really well with an 8B Aya-Expanse setup that I have behind another proxy that maintains message-history for better context, though that could conceivably be integrated here if each individual translation panel can maintain its own state (%message[0]% for most recent, %message[1]% for second-most, with enough JSON awareness to delete the encapsulating object if a message is undefined) or perhaps provide an ID substitution-variable (%panelId%) so my proxy could multiplex, though I think I could get good enough results by just retaining history based on minimum input length)

I don't necessarily want a separate panel for translations to be aggregated, but it seemed like an easy shortcut for implementing multiple capture-points if you couldn't refactor the implementation: specify multiple regions to scan, have everything just dump to a terminal window instead of trying to redraw over the UI elements, and let the user figure out which is which. From what you've said, though, it sounds like you've built something better and more sophisticated, so this lazy approach probably isn't needed.

I'm very much looking forward to testing it as soon as I can, probably tomorrow or Thursday. 


Thank you very much for your continuing work on this project!

I'm not sure how best to resolve this.

About a year ago, I donated $5, not realising (or maybe it wasn't clear at the time, or maybe I was just targeting that threshold for a number of games) that it was not enough to unlock the full game.

About a month and a half ago, I finally noticed the discrepancy and contributed $18, which unlocked everything. But it did so by creating a new transaction, and the Itch.io client, which I would prefer to use to not have to track updates manually, only shows the earlier transaction, for the demo version, and I've been unable to get it to resolve the new one after a number of time-separated attempts.

I am not seeking a refund or anything, just, like, guidance from anyone who may have gone through something similar before. Itch's support documentation suggests that a donation can be cancelled by messaging them or the creator, but that doesn't seem fair, at least without making the circumstances known.

There is indeed a disabled button and I figured that might be trying to communicate that it was disabled, as I saw in your response to another user about two weeks ago.

I will experiment with the other modes, though the path that seemed most intuitive was to use Desktop mode and to just identify the part of the screen (in a non-fullscreen application, at least in this case) that to be translated. I am unsure whether a typical user would be confused if the GameTranslate window were to also be captured initially, since they would likely just readjust their settings to keep it away from the text to be translated.

(1 edit)

No problem at all for the delay, and those features as described sound like they'd provide pretty complete coverage of what I'd be hoping to see.

If it does already re-capture the selected area, I have not seen it automatically updating the translation as the text changes. It may be that the interval is longer than the time I have allowed (though I did leave it for over a minute).

My method has been to use Desktop mode and select a region using the default hotkey, L-CTRL. I believe the version I have installed is the latest, 0.3.5, and I do not have any other overlay systems in place that would be likely to interfere.

The system is, however, Windows 11, which I understand has not been thoroughly tested, but if there is a way to get lower-level debugging logs, I can experiment and share anything interesting with you, like if it seems to be the case that the capture-region drifts or that the operation fails for some reason. It's been a while since I last looked into Windows graphics calls, but I'm no stranger to unprocessed streams of verbose data.

Sorry, I just re-read your post and saw that you were clear that it does re-poll the area, but that it doesn't automatically act on the changes. And yeah, I can imagine why it would be tricky to make it smooth.

Keeping the panel where the translation appears in the exact same place that the user has positioned it should help (anchored to a top-left origin should be intuitive enough), but adjusting for width, height, and, depending on the game, the possibility of some background animation triggering a false-change in the OCR result might lead to unwanted flickering.

Maybe that could be mitigated by diffing the strings to see if the delta is more than one or two or some configurable number of characters, or a levenshtein edit-distance in languages where that makes sense.

UX isn't really my thing, unfortunately, but I can draft a parsing function or logical flowchart for something if that might be useful to work through an idea.

With the addition of Desktop Mode, GameTranslate now seems to work for many of the RPG-style games I've wanted to play, so I've done the purchasing thing.

But there are two (or ideally three, depending on how the code and key-hooking is structured, since it may be way too much work) things that would really make it amazing, at least in my case:

  1. Allow both Japanese-to-English translations and the Romaji feature to be used concurrently
    1. This would be very helpful for people with intermediate language skills who can mostly read native text, but need help with unfamiliar characters
  2. Polling of a text-area to automatically update a panel showing the translation
    1. For RPGs and visual novels and the like, which tend to have a very static text-area, being able to set the text-region once with a hotkey, like we can do now, then either having that region polled on a timer or in response to another hotkey, would remove a lot of usability friction
      1. The logic here would be pretty simple:
        1. If the OCR-result string changed since the last scan, re-run translation
        2. If the OCR-result string did not change, do nothing
        3. If the region is blank, clear the panel
          1. Do not report that there was nothing to translate
          2. Ideally, do not resize it, either, so that it doesn't draw too much attention
      2. The panel showing the translation should be reused so that its position on the screen doesn't change
    2. I think this might be what another user requested as "real-time translation"
  3. If it is possible to implement the automatic polling feature (2), being able to set multiple regions to be monitored simultaneously would be outstanding
    1. For an RPG, it can be helpful to not only translate the dialogue region, but also something like the speaker's name or an in-game system like a weather indicator
      1. This might be a reasonably easy workaround to the problem of OCR string concatenation, where the speaker's name is too close to the dialogue itself, so there isn't a good way of hinting to the translation engine that it should be evaluating things separately
      2. If implementing multiple translation panels doesn't feel right, I'd imagine most users would be happy to see everything in the same panel, just separated by linebreaks or as part of a bullet-point list

I maintain enough projects myself that I know how fatiguing feature-requests can be, but I'm certain that there are other people who would be sold immediately with at least one of these.