Africa Deep Tech Challenge 2026

Introduction

Tools and Technologies

Clone the submission template
Clone the model profiler

Inspiration

1. Common AI Model Formats

Link: https://huggingface.co/blog/ngxson/common-ai-model-formats
Why relevant: Explains the fundamental differences between PyTorch, Safetensors, and GGUF formats. Critical for your hardware setup because it covers how GGUF uses memory-mapped loading (mmap()) to run large models without preloading the full weight matrix into your 8 GB of RAM.
Estimated Study Time: 15 minutes

2. A Practical Guide to GGUF Quantization Selection

Link: https://knightli.com/en/2026/04/11/llama-gguf-quantization-selection/
Why relevant: Walks through Q2–Q8 precision levels and explains K-quant suffixes (Q4_K_M, Q5_K_S). Establishes why Q4_K_M is the practical sweet spot for speed vs. quality on memory-constrained hardware.
Estimated Study Time: 10 minutes

3. Official Ollama Documentation & Setup Guide

Link: https://github.com/ollama/ollama/blob/main/README.md
Why relevant: The authoritative source for installing Ollama on Ubuntu, pulling GGUF models from HuggingFace, and running a local OpenAI-compatible inference server. More trustworthy and maintained than community gists.
Estimated Study Time: 20 minutes

4. llama.cpp Official Build Guide (Linux)

Link: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
Why relevant: The official compilation instructions for llama.cpp on Ubuntu/Debian — covers CMake flags, OpenBLAS acceleration for CPU-bound matrix ops, and build variants. Authoritative source, unlike third-party mirror sites.
Estimated Study Time: 30 minutes

5. llama.cpp Discussions: Optimizing CPU-only and 8GB RAM Performance

Link: https://github.com/ggml-org/llama.cpp/discussions/21136
Why relevant: Community-verified OS-level tweaks for Debian/Ubuntu with 8 GB RAM. Covers vm.swappiness tuning and the --mlock flag to pin models in physical memory and prevent inference freezes.
Estimated Study Time: 20 minutes

6. Monitor & Control CPU Temperature on Ubuntu 22.04

Link: https://help.ubuntu.com/community/SensorInstallHowto
Why relevant: Directly addresses the competition's 10-point thermal penalty. Teaches you how to install lm-sensors, read core/package temperatures in real time, and configure thermald and CPU frequency governors to keep your chip below the 85°C disqualification threshold under sustained inference load.
Estimated Study Time: 20 minutes

7. Build Small Hackathons with Cohere Models

Link: https://huggingface.co/blog/CohereLabs/build-small-hackathon-with-cohere-models
Why relevant: Shows how to run the multilingual Aya model (3B) fully offline using llama-server with a Gradio UI. Aya's African language support makes this directly useful for maximizing the African Use Case Bonus.
Estimated Study Time: 45 minutes

8. Masakhane: African NLP Benchmarks and Datasets

Link: https://github.com/masakhane-io/masakhane-nlp
Why relevant: Masakhane is the leading open-source African NLP research community. This repo provides datasets, benchmarks, and fine-tuning baselines across 50+ African languages — the most direct resource for building a model that scores well on the African Use Case Bonus.
Estimated Study Time: 30 minutes (orientation + exploring datasets)

9. LLM Benchmark for Throughput via Ollama

Link: https://github.com/aidatatools/ollama-benchmark
Why relevant: A Python CLI tool that runs structured throughput benchmarks against your local Ollama server and reports tokens-per-second. Directly maps to the Model Throughput Performance judging criterion.
Estimated Study Time: 15 minutes

10. llama-bench: Syntax, Usage and Documentation

Link: https://github.com/ggml-org/llama.cpp/blob/master/tools/llama-bench/README.md
Why relevant: The native llama.cpp benchmarking tool. Lets you isolate prompt processing (pp) vs. text generation (tg) speeds, and tune batch size, context length, and thread count (-t) to find your hardware's optimal configuration before submission.
Estimated Study Time: 20 minutes