Rendered at 21:09:21 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
_nalply 9 hours ago [-]
I am Deaf. I lipread German, but rarely lipread English and was successful with lipreading people speaking English as not their native language, namely Greek, Estonian and Marathi but never from the Anglophone world. Turning older, my lipreading competence declines, and I prefer talking in my local Signed Language.
What this article is missing, I think, is that lipreading is not replacing auditory input with visual input because there are too many, let's just call them homophones. I find the word "visemes" a bit too cute. You need a lot of context. So I always struggle if someone asks me a very short question, because I don't know what it's about. Someone comes to me, says hello, and asks the question, and I don't have an idea what they want to know.
Re gameplays, the context is special, because you can assume that the football coach is not shouting "HAKUNA MATATA" over the field. This simplifies lipreading during gameplays. Essentially it devolves into something like a set of radio buttons.
redbell 12 hours ago [-]
Football coaches and players, especially in Europe games, were used to speak freely during gameplays but in recent years they were forced to mask their lips/mouths otherwise, specialists will decipher their conversations and reveal their intentions in the game.
irishcoffee 10 hours ago [-]
MLB has been doing this for a long time.
55555 9 hours ago [-]
I just realized the government probably has a lip reading AI model trained. Training one would be super easy. Download youtube videos with uploader-provided captions, cut to just scenes where only a single face is detected, and then use the lip points and facial landmarks and subtitle text (which has word-level timings) as training data. Then you can point a camera at anyone from a distance and know what they are saying. The longer they talk, the more accurate the output will be, as additional context is provided.
kruffalon 10 hours ago [-]
What a delightful find, pairs well with the post about designing for a blind client[0].
Thank you for posting <3
serious_angel 4 hours ago [-]
It does not seem the technologies in use utilize anything actually serious for the verification, yet some devices are in design for the following, in also public patents:
> Some disclosed embodiments involve determining an emotional state of an individual associated with the facial skin micromovements, and extracting meaning from the at least one subvocalized phoneme and the determined emotional state. The term “emotional state” refers to an individual's emotional condition and may be used as an indicator of the individual's behavior, cognition, and overall well-being...
>
> As another example, the additional data may include the individual making a vocal statement that is different from a statement associated with their subvocal facial skin micromovements. Using such a vocal statement in the authentication is desirable to indicate that the user does not intend to make that statement at that time, such as in situations of duress like being threatened to say that statement...
>
> The event may have an asymmetrical impact on the subject. In such cases, a sensing device may be used for monitoring/detecting muscle micromovements on both sides of the face and comparing said muscle micromovements. The comparison and differences between the facial muscle micromovements of each side of the face may be used to determine the extent of damage, as well as monitor deterioration or improvement in the subject's condition. As a difference in facial muscle micromovements may be determined and may be above a certain threshold, an indication of an illness/condition or episode may be generated.
>
> Source: <https://patents.google.com/patent/US20250173415A1/en>
roysting 12 hours ago [-]
I guess they’ve never heard of the Bad Lip Reading YT channel
_nalply 6 hours ago [-]
Lipreading Deaf here.
I tried to lipread the Inauguration, and where it was successful I did read the same nonsense, but often it wasn't possible because they weren't looking at the camera or the microphone was in the way.
The Seagulls, however, the only thing I could read was amm-amm-ammmmm-am-am-aaaammm. Nothing else. Of course, the doll has very simple mouth mechanics.
What this article is missing, I think, is that lipreading is not replacing auditory input with visual input because there are too many, let's just call them homophones. I find the word "visemes" a bit too cute. You need a lot of context. So I always struggle if someone asks me a very short question, because I don't know what it's about. Someone comes to me, says hello, and asks the question, and I don't have an idea what they want to know.
Re gameplays, the context is special, because you can assume that the football coach is not shouting "HAKUNA MATATA" over the field. This simplifies lipreading during gameplays. Essentially it devolves into something like a set of radio buttons.
Thank you for posting <3
I tried to lipread the Inauguration, and where it was successful I did read the same nonsense, but often it wasn't possible because they weren't looking at the camera or the microphone was in the way.
The Seagulls, however, the only thing I could read was amm-amm-ammmmm-am-am-aaaammm. Nothing else. Of course, the doll has very simple mouth mechanics.