Schedule
We will stick to this schedule, and should there be any important changes to the schedule, assignments, or reading materials, you’ll receive an email notification.
Week 1 - Technology for African Languages, Why Technology for African Languages, Current state of Technology for African Languages
In this module, we will have a high-level discussion about language technology i.e ChatGPT, Gemini, Llama etc, the current state of technology for African languages and, challenges facing the development of language technology for African Languages.
Sep 4 Discussion Technology for African Languages
- Pre-Class Reflection:
- Take a Look at the Ethnologue page, Try to answer the following questions;
- How many languages are spoken in the world today? How many of the spoken languages are from Africa? What countries have the most languages? What continents have the most indigenous languages?
- Read through this blog, see if you can understand the state of Natural Language Processing Research for African languages in 2019.
- Read through the Masakhane, and the AfricaNLP pages, make a reflection about Masakhane, AfricaNLP and the efforts being made to create technology for African languages.
- Learn about African languages One Thousand Languages
- About technology and technology for African languages, a reflection about Generative AI.
Week 2 - Introduction to main linguistic groups of Africa
Sep 11 Discussion Linguistic groups of Africa
- [Slides]
- G Tucker Childs An Introduction to African Languages, Chapter-1: Introduction and Chapter-2: Classification of African languages.
- Sands et al., AFRICAN LANGUAGES, The SAGE Encyclopedia of Human Communication Sciences and Disorders. ed. Jack S. Damico & Martin J. Ball. Thousand Oaks, CA: Sage Publishers. pp. 1020-1024. (May 2019)
Videos The Amazing Languages of Africa - sounds, grammar and writing systems of African languages. The Languages of Africa https://www.youtube.com/watch?v=1WhIiqHr0q0 Example of Khoisan - Siki Jo-An – ‘The Click Song’ | Blind Audition | The Voice SA: Seaso 3 | M-Net Pelonomi Moiloa: Decolonizing Artificial Intelligence to empower local talent Sabelosethu Mhlambi: Decolonizing AI
Additional materials
- Chapter3 (An Introduction to African Languages): Identify linguistic features prevalent in linguistic families
- International Journal of American Linguistics
- African languages an Introduction Bernd Heine & Derek Nurse Cambridge University Press 2000
- The Linguistic Face of Africa by Benard Odoyo Okal 2016
Week 3 - Introduction to (Ki)Swahili Language, Swahili as an African Language vs a dialect of Arabic
Sept 18 Discussion Introduction to (Ki)Swahili Language
[Slides]
- Barasa & Mous – Oral and written Interface in SMS
- Njihia s. Kamau., A Digital Africa Kiswahili Holds the Key
- A Brief Introduction to the Bantu Languages
- Leonard Muaka., The complexities of noun class system in the acquisition and learning of Swahili language., Howard University
Video The Swahili Language – A native language that absorbed a lot of foreign vocab
Week 4 - Introduction to NLP and Its applications
It is common for people to use NLP technology every day without even knowing it. For example, Google Search knows what you’re looking for through either text or speech, Gmail generates smart reply responses based on messages etc. This module will introduce NLP and its applications.
Sept 25 Discussion Introduction to NLP and LLMs
- [Slides]
- Pre-Class Reflection:
- Chapter 2: Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Try to answer the following; what is a document and document segmentation? what is a sentence and sentence segmentation? what do you understand by a corpora, what is tokenization? Explain these terms; stopwords, stemming, lemmatization.
- Related papers;
- Ruder, Sebastian on Why You Should Do NLP Beyond English
- Ife Adebara, Muhammad Abdul-Mageed. “Towards Afrocentric NLP for African Languages:Where We Are and Where We Can Go.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) Volume 1: Long Papers, pages 3814 - 3841.
- Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, et. al.,”Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities” In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 126–139.
- Chesire Emmanue, Kipkebut Andrew. “Current State, Challenges and Opportunities for Natural Language Processing Research and Development in Africa: A Systemic Review” In AfricaNLP workshop at the International Conference on Learning Representation (ICLR 2024).
- Additional Reading
Skim through Chapters 13,14, 15 and 16 of the book Speech and Language Processing to have a broad understanding of the Applications of NLP, no need to understand the technical details.
- Hedderich et al., A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios Proceedings of the 2021 conference of the North American Chapter of ACL-HLT pages 2545-2568
Week 5 - Multilingual NLP and corpus annotation
In this module, we will discuss existing African language datasets, we will practice named Entity recognition data annotation for African languages using an existing tool.
Oct 2 Discussion Multilingual NLP and corpus annotation
- [Slides]
- Reflection:
- Read about data annotation and data labeling through this blog, summerise in one paragraph what is annotation and why it is necessary.
- Adelani et al., MasakhaNER: Named Entity Recognition for African Languages. Read abstract, section 3 of focus languages and section 4 about the Data and annotation methodology.
- Cheikh et al., MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages. Read the abstract, all of section 4 about data and annotation and section 5 annotation challenges.
- We will practice using African language text corpus to annotate African languages.
- We will use an external annotation tool to annotate Named Entities in African languages text.
Week 6 - NLP and social media
Oct 09 Mid-term TBD
- [Mid-term exam]
Oct 09 Discussion NLP and social media
- Readings
- Soffer, O. (2010). “Silent Orality”: Towards Conceptualization of the digital oral features in CMC and SMS texts, Communication Theory 20, p. 387-404.
- Digital Culture is Like Oral Culture Written Down
- Natural language processing and Social media blog
- Video
- Supplementary/Additional Readings
- Ong, W. J. (1982). Orality and literacy: The technologizing of the word. London: Methuen. Publishing House.
- African Languages and Information and Communication Technologies: Literacy, Access, and the Future
Week 7 - Information Extraction
In this week we will discuss techniques for extracting semantic content from text. And how this process of information extraction turns the unstructured information embedded in texts into structured data.
Oct 23 Discussion Information extraction
- [Slides coming soon]
- Readings:
- Chapter 20 Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Read about information extraction, Relation extraction, events and time. No need to go into technical details.
- Chapter 20 Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Adelani et al., MasakhaNER: Named Entity Recognition for African Languages. Read from abstract to section 4 about named entity for African languages.
Cheikh et al., MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages. Read from abstract section 5 about parts of speech for African languages.
- Practice Annotation:
- Named Entity Recognition task Annotation guideline
- Part-of-Speech task Annotation guideline
- We will practice using African language text corpus to annotate African languages.
- We will use an external annotation tool to annotate Named Entities in African languages text.
Week 8 - What is a Search engine – an under the hood view
As soon as computers were inveted we were asking them questions, because we need to know things. Systems in the early 1960s were answering questions about baseball statistics and scientific facts. In this module we will have a sufficient overview of search and search engine.
Oct 30 Discussion Search Engine
- [Slides coming soon]
Related Readings
- Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Read Chapter 14 about Question answering and Information retrieval, no need to go in details, it is okay if you don’t understand the technical details.
- Read section 14.3.2 about datasets used to evaluate question answering systems.
- Odunayo et al., AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages Read from abstract up to section 2.4 summarize in one page the paper.
- Practice data
Week 9 - The movie was okay – analyzing the sentiment of texts
This week will discuss sentiment analysis, the extraction of sentiment, the positive or negative orientation that a writer expresses towards some object.
Nov 6 Discussion Sentiment of texts
- [Slides coming soon]
- Readings coming soon
- Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Read Chapter 4 about sentiment analysis, no need to go in details of the algorithms.
- Muhammad et al., AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages, Read the abstract and introduction about sentiment analysis for African languages, section 4 about data collects and processing, and Section 5 about data annotation challenges
- YOSM: A NEW YOR `UB ´A SENTIMENT CORPUS FOR MOVIE REVIEWS Try to answer what is the contribution of this paper.
Week 10 - Bridging the language barrier using machine translation
Translation, in its full generality, such as the translation of literature, or poetry, is a difficult, fascinating, and intensely human endeavor, as rich as any other area of human creativity. This module introduces machine translation (MT), the use of computers to translate from one language to another.
Nov 13 Discussion Machine Translation
- [Slides coming soon]
- Readings coming soon
- Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Read Chapter 13 about Machine translation ..
- Section 13.5 talks about Translation in low-resource situations, Summerize in one page some of the approaches for dealing with low resource translation.
- Many issues in translating low-resource languages go beyond the purely technical, read about societal issues in section 13.5.3 and summerize problems and challenges faced when translating for low resource languages.
- Machine translation raises many ethical issues, read section 13.7 of bias and ethical issues in machine translation.
- Read Chapter 13 about Machine translation ..
- MENYO-20k: A Multi-domain English–Yor `ub´a Corpus for Machine Translation and Domain Adaptation
Week 11 - Going beyond text processing; complexities of spoken language
One of the earliest goals of language processing in computers is to understand spoken language. This module will introduce Automatic Speech Recognition (ASR), Text-to-Speech (TTS) and how to build speech recognition systems for African languages.
Nov 20 Discussion complexities of spoken language
- [Slides coming soon]
- Readings:
- Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Read Chapter 16 about automatic speech recognition (ASR), text to speech (TTS) and other speech tasks.
- Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models Third Edition by Daniel Jurafsky, James H. Martin.
- Nabende et al., Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons. Read from abstract to section 4 about named entity for African languages. Read from abastract to section 7, you can leave section 8 of the models and experiments but read section 9 about the experiences, challenges, and lessons learned.
- Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Week 12 - Project presentations
Dec 4 Present Present final project
- A group will choose an African languages or languages and create a profile around the language:
- Language family, structure, where it is spoken, some statistics, interesting fact, a video or audio etc.
- Work on any of the tasks and or address a problem we covered in class for that language.
- Prepare slide presentation and or demonstration for 15 minutes and 5 minutes for QA.
- Submit the slides plus a maximum 2 page individual report of the project.