There was an interesting article on Slashdot recently that described how Google is implementing their next-generation machine translation software. Rather than requiring humans to manually input rules for translating between various languages, they are feeding learning algorithms with a variety of documents that have already been translated to many different languages. These documents act as a kind of Rosetta Stone, allowing the algorithms to correlate nouns, verbs, adjectives and grammar rules automatically.
Although this approach is promising, it doesn't allow the learning system to tie nouns, verbs and adjectives back to the real world. For example, "cat", "chat" and "katze" might all mean the same thing, but how does the system know that a cat is a cute fluffy animal?
This issue could be addressed by creating a database that associates all kinds of real world input. For example, a picture of a cat could be annotated with the sentence "this is a cat". Similarly, a video of a cat jumping onto a table could be annotated with the sentence "the cat is jumping onto the table". With a large enough database at its disposal, a learning algorithm should be able to figure out what "cat", "jump" and "table" all refer to in the real world.
The fastest way to create such a database would probably be to use an approach similar to Wikipedia; allow anyone to upload any combination of picture, video, audio, text, touch, taste and other sensory data that the learning algorithm could then use as input.
One of the great benefits of having such a database is that any new learning system could use it as educational input. A Human child takes years to consume and process enough input to create a decent world view; a Machine "child" could conceivably create a similar world view in days or hours.
Comments