I like this idea! I think it works quite well with some of the other discussion, too. How does this sound:
Syntax for embedded media is like the syntax for links, with one difference. Instead of link text, we have a media description. The rules for this are the same as for link text, except that (a) a media description starts with

will continue to work, as well as items like 
. Markup like 
can be rendered as audio tags, as can !audio[](music.php)
.
I like that the syntax is extensible, too: if there are types of embedded media that we haven’t thought of (or haven’t been invented yet!) then it can be incorporated into the scheme without rewriting the spec.