Live video feed of hands

Synthesia is a living project. You can help by sharing your ideas.
Search the forum before posting your idea. :D

No explicit, hateful, or hurtful language. Nothing illegal.
Posts: 12644

Post by Nicholas »

So, I've finally tried to work a few sessions of actually playing Synthesia into the last week or so. And in the course of doing so, I found a feature *I'd* like to have. (I suppose in the industry it's called "eating your own dog food" or something like that.)

Anyhoo, I'm curious if anyone else has run into this, if it'd be super helpful, or if it's something you get used to after playing for more than 5 hours total, ever, and it's not actually necessary.

At least for larger jumps, I still have to look down at my hands to place them correctly. I'm getting a little better at taking the plunge and slamming my hands down like two octaves away from where they used to be and hoping for the best.

I'd like to look down for visual placement feedback... but I also don't want to look away from the music.

If I had a webcam perched above my monitor, facing down, and had the video feed "mapped" to the on-screen keyboard, I wouldn't have to. The display would take up no additional space... my hands would just appear to-scale on the screen in real-time.

Is that a reasonable solution?

Is it too strong? Am I missing something obvious and simpler?

Does the hand-placement thing just come with time?

Would it be a useless feature because now not only do you need a keyboard to play the game but also a webcam to get the most out of this particular facet? (Not just a camera... built-in laptop cameras wouldn't do. A flexible able-to-be-pointed-downward camera only.)

I personally think it'd be sweet. ;)

It'd also be a nice way to verify the upcoming manual note fingering thing too. You'd see a number on a note and look 10 pixels below and be able to see if you're using the correct finger.
Posts: 487

Post by Choul »

It would have been very nice to see you playing with your hands on the video. :D
Posts: 12644

Post by Nicholas »

I guess one benefit I didn't stress very strongly was the keyboard-to-screen scaling.

Right now I see a jump of maybe 4 inches on the screen between notes. That translates to maybe 10 inches on the keyboard. Early on in the process, reconciling those differences (unless you have a projector or super-huge screen you can line things up with manually) is challenging.

Having your physical input mapped back down to the notes on the screen should (in theory) help you improve much more quickly.
Posts: 762

Post by tommai78101 »

I haven't seen a new upcoming feature of this added to the list yet.

Have you considered the size of the keyboard and the zooming in/out of Synthesia's keyboard into account? :?
Hardware Information: Windows Vista Home Premium SP1, 358MB Mobile Intel Graphics Media Accelerator X3100, Synthesia 0.7.1 preview r697, 2 GB DDRAM, 1.6 GHz Intel Pentium Dual-Core Processor T2330, Acer Aspire 5720-4126
New Hardware Information: Windows 10 Pro, 2GB Nvidia GeForce 860M, 8GB RAM, 1.7GHz Core-i5 4210U, Alienware 13 R1.
Posts: 899

Post by vicentefer31 »

Nicholas wrote:...the upcoming manual note fingering ...

By the way, I have written several post about manual note fingering because I'm trying to get votes from others members for this feature, but until now with no luck. I always thought it was going to be in first place of the feature voting list in a couple of days when we could vote it but it looks I was wrong.
It's a pity Nicholas has so many good ideas and so little time to do it. :(

Sorry...Grouchy Smurf wants to say something: "...but time enough to have waves in the Set Up Keyboard or a floaty-window tool-tippy thing to follow the mouse".
Picasso: I am always doing that which I cannot do, in order that I may learn how to do it.
Posts: 12644

Post by Nicholas »

I was holding off making a feature voting item until I found out whether I was crazy or not. If there is a simple alternative solution, adding an item would be egg on my face. ;) I'd prefer to let the idea simmer, percolate, and all those other cooking analogies before putting the final result on the voting list.

That said, up until now I don't feel like I've had a horse in this race. (I loved all the features on that voting list equally... they were all my children... or something. :D ) Now I feel like not having real-time hand position feedback is holding back my progress. I'm even falling into the trap of wanting a niche feature -- available only to people with MIDI keyboards and discrete web cams.

Anyway, we'll have to see how my bias affects when/if the feature gets completed. I've been trying to hide the fact that this is really a dictatorship despite the illusion of democracy. That I've already been poking around various video-input libraries is going to make that illusion difficult to maintain pretty soon. :D
Synthesia Donor
Posts: 1180

Post by TonE »

I like your video/webcam idea a lot. I would recommend a PS-3 webcam, which is very cheap and having great features, which I already use in Windows XP, some crazy developers coded the right drivers for Windows XP and higher...

I am using it so far mostly via KMPlayer and its video capturing feature, but having even direct-in-Synthesia support would be great, especially if the "in video keyboard size" is matched to the Synthesia keyboard display.
Posts: 12644

Post by Nicholas »

TonE wrote: especially if the "in video keyboard size" is matched to the Synthesia keyboard display.
Yeah, there'd be a quick one-time (any time you moved the camera) calibration that would probably take only a few steps. Something along the lines of:
- Choose which camera device to use (would only pop up in the rare case there is more than one)
- Hit the lowest note that is completely visible in the frame.
- Hit the highest note that is completely visible in the frame.
- Click the top-left corner of the lowest note.
- Click the bottom-left corner of the lowest note.
- Click the top-right corner of the highest note.
- Click the bottom-right corner of the highest note.

Basically defining a rectangle that will be pulled from the feed and drawn on top of the keys.

If I get fancy and use something like OpenCV I might be able to do something like:
- Completely remove your hands from the selected video area and then click the mouse.

At that point I could use some color-key'ing style technique where I "subtract" the (live) keyboard video so it's *just* your hands on top of the in-game keyboard so you could still see highlights, key markers, key labels, etc. That type of thing might push the system requirements up a little though.
Synthesia Donor
Posts: 1180

Post by TonE »

Additionally, there are already multi-PS3-webcam drivers available, which I did not test (as I only have one PS3 webcam) which could be used as follows

- first webcam as TOP cam
- second webcam as SIDE cam

At least for future applications, beginners could get a 3D view of the hands.
Posts: 12644

Post by Nicholas »

That would be really, really niche. ;)

The side-view would be helpful for keeping on eye on your hand posture and maybe helping you develop better technique. That might be nice for some learning module that tried to teach you how to hold your hands.

But, it doesn't strike me as the type of information feed I feel like I need before I can really play effectively like the top-down video does.

That, and the more complicated you make the setup, the less likely people are to use it. I know I've got some pro-audio recording stuff that I practically refuse to use because it means dragging out a hundred wires, assembling stands, plugging in a few rack boxes, getting mics set up... I'd rather just use the little junky mic on the gaming headphones I already have plugged into my computer.

I'd hate to come up with a solution that required building elaborate balancing-act stands where you had webcams and their associated cords going every which way.

If tipping it down is enough to get a decent angle on the keys, a 6 step calibration is probably the most hassle I'd want to dump on the user.
Posts: 51

Post by Frost »

hey everyone.
it's a good idea, I'd want that if there was a choice. But few points may be problematic.
one, how does it fare from the pedagogic viewpoint? Seeing your fingers can make you play "with synthesia" easier, but does the skill transfer well to normal playing? No change? Better? Worse? (players won't be used to glancing at their finger and the sheet back and forth. Also, seeing the fingers all the time can affect touch-playing for worse?). But still, I'd like that feature to be available, I'd definitelty try that. For science.

The other problem arises from practicality. The video feed may be calibrated (position, rotation, barrel distortion etc.) and manipulated with enough work. What I'm thinking of is, where do I put the webcam? Most webcams won't be very wide angle. Thus, for it to capture all of the keys (at least 72 if not 88), it should be, what, 1 meters above the keyboard? Ropes hanging from the ceiling? :) Homemade fused plumbing pipes? What about the camera and data cables, dangling just in front of my eyes? How can the webcam be placed? Maybe the cam can be placed on person, on a headband, around neck, shoulder.. It would also force good posture to be used :)

...Ok, I just had an idea.

...aand ignored it instantly. Mocap type passive electromagnetic transmitters placed above hands, plus 4-5 sensors stuck to various points on the keyboard is not a more practical approach, I would think. :)
Posts: 12644

Post by Nicholas »

Welcome back!
Frost wrote:how does it fare from the pedagogic viewpoint?
That's a fair point. I'd be building in a future handicap like hunt-and-peck typing. While you could never carry it forward to playing from (real, paper) sheet music... I wonder if the need there is less. While the note-heads in notation do technically map to keys on the keyboard, it's certainly nowhere near as 1:1 as the falling note display works. While to me that only makes the gulf between being able to play from sheet alone starting with zero skill wider, I guess I'm not really in the business of helping people do that. If someone offered me a tiny LCD screen that could track my current position on a page of sheet music and show (I guess, rotated 90 degrees?) my hands mapped to the note-heads, I'd try it. For science.
Frost wrote:The other problem arises from practicality. The video feed may be calibrated (position, rotation, barrel distortion etc.) and manipulated with enough work.
It shouldn't be too much work. Assuming you don't move the webcam from play session to play session, it's only one-time, too.

I've done it before with about 16 "control points". (I was trying to get a robot's location mapped to a field of view I was capturing with a camcorder, using the same computer vision library I was expecting to use here.) The idea was the result improved with more control points. Luckily for us, we have a built-in "grid" in our playing surface. It could ask the user for the four corners, and then to maybe click on the very tip of each C# key to get some readings in the middle.

It takes care of all of the things you mentioned: barrel distortion, etc.

Regarding camera angle, actually, webcams are the widest-angle cameras you're likely to find. They're made to be about 1m from a person, tops, and still show the entire upper body of the person with plenty of room to either side.

I looked one up quickly, did some quick napkin math, and it looked like just facing a typical monitor-mounted camera down toward your keyboard should get almost the entire range. 72 should be plenty. Minimally it would be able to show video of whatever you can get in there. The calibration process could include "Press the key that is furthest left in the video". And it would know how to map it from there. Any video is probably better than nothing. In extreme circumstances, you could even tip the camera and recalibrate for a particular song if you needed to.

Otherwise, yeah, I tried and couldn't think of anything better than suspended wires or some bizarre super-tall tripod/pipe style thing. I don't think it will come to that though.

As for wearing it... I'd have to check that library. Adding motion tracking/stabilization might not work very well. I'm hoping it can remain stationary.

Frost wrote:...Ok, I just had an idea ...and ignored it instantly.
The real solution is a ToF depth-tracking camera (like project Natal) that is high resolution enough to distinguish fingers. The software would be crazy. The processing requirements crazier. The camera cost craziest of all... but it would be amazing.
Posts: 313

Post by Lemo »

Hey guys,
I'm bringing back this tread with some testing of my own.
Lately I've been doing experiments with augmented reality, so I guess I was ready to try this thing too.

Before thinking about how you're going to include that feature inside Synthesia, I think it's better to do some practical testing to see if that's technically possible or not.
One important thing that I could notice with augmented reality is the video latency. That's probably the biggest thing you have to worry about, as a video feed got a lot more to carry than an audio one, and I'm afraid there won't be magical Asio-like drivers to save the day this time.
So here is what I get for the moment with Synthesia, VLC, and a HDV camera.

1.Hardware setup
My computer is adjusted as a quite compact layout inside what used to be a wardrobe (^^) so I got plenty of space at the top of it to place a camera.
I'm using a monopod placed horizontally, so I can have some control over the angle and stuff.
At least I'm sure the whole thing won't fall on my head while I'm playing :p
My compact setup (and a crappy cellphone pic...)

2.Software setup
No need for fancy additional software, VLC can handle.
Go to Media > Open capture device
You can find you DV or web camera there.
I reducing the caching a bit to get a lil less delay.
Select everything inside the "Edit options" box and copy it.
You can close VLC.

Open Synthesia config tool
You have to run it in a window to be able to see VLC on top of that.
For some reason OpenGL works a lot better that DirectX in windowed mode for me.

3.Script setup
Open notepad and paste the following VLC command line:

Code: Select all

"C:\Program Files\VideoLAN\VLC\vlc.exe" dshow:// :dshow-vdev="Microsoft DV Camera and VCR" :dshow-adev=none  :dshow-caching=100 --no-video-deco --no-embedded-video  --video-on-top --deinterlace=1 --video-x=1 --video-y=580 --width=1680 --height=336 --crop=5:1 ---video-filter=rotate{angle=180}
You need to replace the content with your own settings:
Path to VLC | the camera options you copied earlier from VLC | stuff to make VLC borderless | video size and placement | rotation because my camera is upside down
(The crop ratio should be the same as width/height ->1680/336=5/1)
Save this as VLCsynthesia.bat or whatever you like.
You can convert that into an .exe later if you like with that.

Run the script.
Launch Synthesia.

Synthesia window with VLC live frame on top of it

Random facts & limitations:
-I could use a better camera placement/alignment
Hard to go farther though with a short firewire cable
-Need to alt/tab at the beginning of each song to bring back the live frame
Also you can't control synthesia while VLC is on focus
-Delay issues for the video feed
I'm getting almost half a second of delay, which kills part of the concept
You may have a good preview of your hands positions with slow songs, but 0,5sec is for example
way too long to check your placement when switching between fast notes
I also had to switch my HDV camera to DV mode to get a faster response

Now I'll be looking for a way to improve that delay. Hopefully some of you may also give that setup a try.
I used to have a crappy low def webcam a few years ago, but apparently they got some nice HD cheap ones nowadays,
maybe those got a better support and a shorter delay than firewire systems.
Last edited by Lemo on 04-21-11 8:00 pm, edited 2 times in total.
Stuff & experiments for Synthesia: Gramp v0.2SkinboxFireSynthVideoWebradio
Posts: 12644

Post by Nicholas »

Yeah, I'm definitely overdue for running a test like that. Though, I think the latency is a surmountable problem. VLC must be doing some extra processing or something. A few years back when I was talking to cameras directly with Intel's OpenCV, I want to say the latency felt like only a couple frames (at 60 Hz). You could wave your hand in front of the camera -- even while it was still running various detection algorithms -- and the processed output still felt totally real-time.

I'm not sure though... that was a long time ago now. I might be making things up at this point. ;)

Very cool setup, by the way.
Posts: 1505

Post by aria1121 »

It's actually a smart way to compile video and audio thru VLC. Also, nice setup!
Posts: 313

Post by Lemo »

I actually heard about OpenCV, while I was doing those augmented reality tests.
At some point I wanted to use it with processing, but I quickly realized processing wasn't optimized at all for video.
Do you know some way I could test that without programming anything?
For the moment, I have tried through VLC, Virtual dub, Premiere pro, QTcap, Processing with JMyron, BuildAR, ARive, and I have about the same latency everywhere.
Maybe it's more like 0.3sec than 0.5, but that's still a lot. You may not notice it when you slowly wave your hand to the camera, but it's different when you clap your hand for example (or play piano).
Or that could just be a matter of computer speed...
Stuff & experiments for Synthesia: Gramp v0.2SkinboxFireSynthVideoWebradio
Posts: 313

Post by Lemo »

I found a way to check that latency precisely. Now we can have numbers :mrgreen:
I'm shooting the screen with the camera, which displays the reference time and the camera time, while my (still crappy) cellphone takes the picture.

Here are the results for some of the apps:
VLC (with transforms) 387ms
VLC base 215 ms
VirtualDub 215ms
QTcap 172 ms

I guess I should try to mount my camera the right way and remove "---video-filter=rotate{angle=180}" for a start.
If you want to test your system:
Stuff & experiments for Synthesia: Gramp v0.2SkinboxFireSynthVideoWebradio
User avatar
Posts: 830

Post by DC64 »

This just gave me an idea.
A song is posted on Synthesia and users will play the song on thier piano/keyboard.
Then they will assign fingerings and it will be posted with some info on hand sizes and such.
"And now for something completely different."
Posts: 149

Post by Kasper »

This feature would be absolutely awesome.

I mean, synthesia is already the easiest way to learn to play the piano. But this feature would even get synthesia to a whole new level.

What about putting colour labels on your fingers for fingering?


Okay, I know, you need big hands for this kind of figering, done it a little bit to quick, but I hope the idea is clear :P
English was my worst subject on school, so my language could be a bit awkward sometimes...
Posts: 321

Post by Pianotehead »

An impressive idea for a feature, though I wonder how complicated it is to code. If we're talking about reading hand sizes, positions of the fingers and marking them with dots. Then we won't talk about any $25, but ponder if Synthesia will break the $100 barrier. Would like to see the faces of those who think $25 is too much, if it comes to that. :?
Post Reply