Towards Hindi/Urdu FrameNets via the Multilingual FrameNet
Paper i proceeding, 2018

The Multilingual FrameNet Project (MLFN, 2017) is using translations of Ken Robinson’s popular TED talk (Robinson,
2006) to study universal and cross lingual aspects of frame annotation. There are no FrameNets yet for Hindi and Urdu,
but we are annotating the Hindi and Urdu translations of Robinson’s talk using the frames of the English FrameNet.
(Surprisingly, there was no Hindi translation, so we did that ourselves). Preprocessing is needed: the word-segmentation
and POS tagging tools available for Hindi and Urdu were satisfactory, the full-form lexicons less so. The web-based
multi-layer frame annotation tool allows additions to the lexicon, so we simply added each form as a new “word”, our
goal here being only to look at the frames and frame elements—we plan to look at grammatical function and phrase
type later. While some sentences show that the frame analysis of English or Portuguese will not carry over to Hindi or
Urdu for cultural or linguistic reasons, others are harder to be deinite about. Partly, this is because there are so many
possible translations. An expected observation is that a choice of word can steer the focus from one frame to another.
Our annotations will help when we start building framenets for Hindi and Urdu.

FrameNet

Frame semantics

Lexico-Semantic Resources

Multilingual FrameNet

Författare

Shafqat Mumtaz Virk

Språkbanken, Göteborgs Universitet

K V S Prasad

Chalmers, Data- och informationsteknik, Funktionell programmering

69-74

LREC 2018 Workshop International FrameNet Workshop 2018: Multilingual Framenets and Constructions
Miyazaki,, Japan,

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier

Språkteknologi (språkvetenskaplig databehandling)

Jämförande språkvetenskap och allmän lingvistik

Studier av enskilda språk