Potential homebrew request and technical questions, GPT Vision and Espeak for O3DS and N3DS systems. Is this even possible?

Mudb0y · Feb 28, 2024

So I've just spent a good part of an hour messing with my accidentally newly acquired O2DS. Originally I was going to install some seeds with FBI so I click the icon, then proceed to OCR the screen to know what I'm doing because none of these apps have any sound cues and sometimes confusing interfaces to a blind person like me.
The OCR was a bit shit so I moved on to Be My AI, which uses OpenAI's GPT4 Vision as it's backend and managed to get super good results. I even did manage to get to installing the seed for Rhythm Paradise Megamix but looks like they weren't found because I got an error.
I then proceeded to play with the settings app, and even got to getting through the account linking portion which led me to Mii Maker in which I partially succeeded in making a mii.
There were lots of hickups though, for example lots of unnecessary information was being red out about my surroundings given I was shooting photos of the screen and not getting a direct image from it. Sometimes I would shoot the photo incorrectly and would have to wait another 10 seconds for it to scan the image and give me the results.
This got me thinking about an idea I had before acquiring this console a couple days ago of a script which when trigger with a controller keybind, like the Rosalina menu as an example, would scan the current screen, then report back using a speech synthesizer such as Espeak or Flite. This would avoid the need for having my phone in my other hand basically at all times, as well as making some games a lot more playable.
Would this even technically be possible? I know it wouldn't work in AGB firm or TWL firm because these don't work in 3DS mode, but it would still give me a lot less frustration when I think about doing something then go "Shit, guess I have to find a sighted person again!"
If someone would be willing to attempt this, you would have my eternal grattitude and I would even be able to pay you for making such a tool! I'm not knowledgeable in C at all and given I found learning Python somewhat dificult, I don't imagine C will be a walk in the park in comparison lol.

ack · Feb 29, 2024

probably not, I doubt the 3ds is powerful enough to do OCR that is going to be useful for you. You could probably send a screenshot off to a server somewhere and have it send sound back but you'd have to implement WiFi and all that in luma. I think the best solution would be to get a 3ds with a capture card mod and then have OCR running on your computer for the feed it's being sent, and then have a script that runs the OCR and says the output when you press a key.

Deepdive543443 · Feb 29, 2024

From my previous experience on porting vision models to 3DS, models like OCR and Object Detection usually takes time and memory. Only a few extremely light-weight models will works. With operating system and gaming running in background, resource management will be a challenging task. Streaming 3DS graphic output to PC and have OCR and others running on PC would be a better approach

Mudb0y · Mar 2, 2024

Deepdive543443 said:
From my previous experience on porting vision models to 3DS, models like OCR and Object Detection usually takes time and memory. Only a few extremely light-weight models will works. With operating system and gaming running in background, resource management will be a challenging task. Streaming 3DS graphic output to PC and have OCR and others running on PC would be a better approach

My idea was to only take the screenshot on the console's end, the OCR would be done by GPT Vision and it would simply send back the result as speech but as @ack mentioned this might not be possible, I wasn't aware Luma doesn't have wi-fi capabilities. I still wish some apps had accessibility though, in cases like FBI it's possible to navigate them without it mostly fine to install CIAs but then you get to apps like Universal Updator which are basically unusable when you're blind.

Kwyjor · Mar 2, 2024

The question remains: why not just stream everything to your PC using Snickerstream, and run whatever OCR program you like on your PC?

Mudb0y · Mar 2, 2024

Kwyjor said:
The question remains: why not just stream everything to your PC using Snickerstream, and run whatever OCR program you like on your PC?

I was going to do this but you can't do that with the OG 3DS systems, and I was curious if a solution that was more portable than that was possible.

Kwyjor · Mar 2, 2024

Mudb0y said:
I was going to do this but you can't do that with the OG 3DS systems

Actually, you can use hzmod. People don't usually recommend it and it isn't well-developed since the frame rate is abysmal, but that hardly matters here.

Mudb0y · Mar 2, 2024

Kwyjor said:
Actually, you can use hzmod. People don't usually recommend it and it isn't well-developed since the frame rate is abysmal, but that hardly matters here.

Thanks, I'll be sure to check it out.

Potential homebrew request and technical questions, GPT Vision and Espeak for O3DS and N3DS systems. Is this even possible?

Member

Well-Known Member

New Member

Member

Well-Known Member

Member

Well-Known Member

Member

Similar threads

Popular threads in this forum