Towards Hindi/Urdu FrameNets via the Multilingual FrameNet
Paper in proceedings, 2018
The Multilingual FrameNet Project (MLFN, 2017) is using translations of Ken Robinson’s popular TED talk (Robinson,
2006) to study universal and cross lingual aspects of frame annotation. There are no FrameNets yet for Hindi and Urdu,
but we are annotating the Hindi and Urdu translations of Robinson’s talk using the frames of the English FrameNet.
(Surprisingly, there was no Hindi translation, so we did that ourselves). Preprocessing is needed: the word-segmentation
and POS tagging tools available for Hindi and Urdu were satisfactory, the full-form lexicons less so. The web-based
multi-layer frame annotation tool allows additions to the lexicon, so we simply added each form as a new “word”, our
goal here being only to look at the frames and frame elements—we plan to look at grammatical function and phrase
type later. While some sentences show that the frame analysis of English or Portuguese will not carry over to Hindi or
Urdu for cultural or linguistic reasons, others are harder to be deinite about. Partly, this is because there are so many
possible translations. An expected observation is that a choice of word can steer the focus from one frame to another.
Our annotations will help when we start building framenets for Hindi and Urdu.