docs: setup proper documentation, organize & clean README

2026-01-31 00:23:31 -06:00 · 2025-09-13 15:27:32 -05:00
parent 94fb6b4190
commit 878cc5f773
7 changed files with 254 additions and 123 deletions
@@ -2,6 +2,4 @@
 /target
 /go/
 .cargo/config.toml
-
-**/*.md
-!/README.md
+src/scraper/README.md
@@ -1,21 +1,28 @@
+default_services := "bot,web,scraper"
+
+# Auto-reloading frontend server
 frontend:
    pnpm run -C web dev

-backend:
-    cargo run --bin banner
-
+# Production build of frontend
 build-frontend:
    pnpm run -C web build

-build-backend:
-    cargo build --release --bin banner
+# Auto-reloading backend server
+backend services=default_services:
+    bacon --headless run -- -- --services "{{services}}"

-build: build-frontend build-backend
-
-# Production build that embeds assets
-build-prod:
+# Production build
+build:
    pnpm run -C web build
    cargo build --release --bin banner

+# Run auto-reloading development build with release characteristics (frontend is embedded, non-auto-reloading)
+# This is useful for testing backend release-mode details.
+dev-build services=default_services: build-frontend
+    bacon --headless run -- --profile dev-release -- --services "{{services}}" --tracing pretty
+
+# Auto-reloading development build for both frontend and backend
+# Will not notice if either the frontend/backend crashes, but will generally be resistant to stopping on their own.
 [parallel]
-dev: frontend backend
+dev services=default_services: frontend (backend services)
@@ -1,125 +1,51 @@
 # banner

-A discord bot for executing queries & searches on the Ellucian Banner instance hosting all of UTSA's class data.
+A complex multi-service system providing a Discord bot and browser-based interface to UTSA's course data.

-## Feature Wishlist
+## Services

- Commands
-  - ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in)
-  - Classes Now (find classes happening)
- Autocomplete
-  - Class Title
-  - Course Number
-  - Term/Part of Term
-  - Professor
-  - Attribute
- Component Pagination
- RateMyProfessor Integration (Linked/Embedded)
- Smart term selection (i.e. Summer 2024 will be selected automatically when opened)
- Rate Limiting (bursting with global/user limits)
- DMs Integration (allow usage of the bot in DMs)
- Class Change Notifications (get notified when details about a class change)
- Multi-term Querying (currently the backend for searching is kinda weird)
- Full Autocomplete for Every Search Option
- Metrics, Log Query, Privileged Error Feedback
- Search for Classes
-  - Major, Professor, Location, Name, Time of Day
- Subscribe to Classes
-  - Availability (seat, pre-seat)
-  - Waitlist Movement
-  - Detail Changes (meta, time, location, seats, professor)
-    - `time` Start, End, Days of Week
-    - `seats` Any change in seat/waitlist data
-    - `meta`
- Lookup via Course Reference Number (CRN)
- Smart Time of Day Handling
-  - "2 PM" -> Start within 2:00 PM to 2:59 PM
-  - "2-3 PM" -> Start within 2:00 PM to 3:59 PM
-  - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM
-  - "after 2 PM" -> Start within 2:01 PM to 11:59 PM
-  - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM
- Get By Section Command
-  - CS 4393 001 =>
-  - Will require SQL to be able to search for a class by its section number
+The application consists of three modular services that can be run independently or together:

-## Analysis Required
+- Discord Bot ([`bot`][src-bot])

-Some of the features and architecture of Ellucian's Banner system are not clear.
-The follow features, JSON, and more require validation & analysis:
+  - Primary interface for course monitoring and data queries
+  - Built with [Serenity][serenity] and [Poise][poise] frameworks for robust command handling
+  - Uses slash commands with comprehensive error handling and logging

- Struct Nullability
-  - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null.
-  - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer).
- Multiple Professors / Primary Indicator
- Multiple Meeting Times
- Meeting Schedule Types
-  - AFF vs AIN vs AHB etc.
- Do CRNs repeat between years?
- Check whether partOfTerm is always filled in, and it's meaning for various class results.
- Check which API calls are affected by change in term/sessionID term select
- SessionIDs
-  - How long does a session ID work?
-  - Do I really require a separate one per term?
-  - How many can I activate, are there any restrictions?
-  - How should session IDs be checked as 'invalid'?
-  - What action(s) keep a session ID 'active', if any?
- Are there any courses with multiple meeting times?
- Google Calendar link generation, as an alternative to ICS file generation
+- Web Server ([`web`][src-web])

-## Change Identification
+  - [Axum][axum]-based server with Vite/React-based frontend
+  - [Embeds static assets][rust-embed] at compile time with E-Tags & Cache-Control headers

- Important attributes of a class will be parsed on both the old and new data.
- These attributes will be compared and given identifiers that can be subscribed to.
- When a user subscribes to one of these identifiers, any changes identified will be sent to the user.
+- Scraper ([`scraper`][src-scraper])

-## Real-time Suggestions
+  - Intelligent data collection system with priority-based queuing inside PostgreSQL via [`sqlx`][sqlx]
+  - Rate-limited scraping with burst handling to respect UTSA's systems
+  - Handles course data updates, availability changes, and metadata synchronization

-Various commands arguments have the ability to have suggestions appear.
+## Quick Start

- They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second.
- They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use.
- It cannot be complicated. I only have so much time to develop this.
- It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally.
+```bash
+pnpm install -C web  # Install frontend dependencies
+cargo build  # Build the backend

-For these reasons, I believe SQLite to be the ideal place for this data to be stored.
-It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions.
+just dev # Runs auto-reloading dev build
+just dev bot,web # Runs auto-reloading dev build, running only the bot and web services
+just dev-build # Development build with release characteristics (frontend is embedded, non-auto-reloading)

- Only required data about the class will be stored, along with the JSON-encoded string.
-  - For now, this would only be the CRN (and possibly the Term).
-  - Potentially, a binary encoding could be used for performance, but it is unlikely to be better.
- Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur.
-  - Upon a safe close requested
-    - Must be done quickly (<8 seconds)
-  - Every 30 minutes, if any scraping ocurred.
-    - May cause locking of commands.
+just build # Production build that embeds assets
+```

-## Scraping
+## Documentation

-In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped.
-Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others.
-This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant.
+Comprehensive documentation is available in the [`docs/`][documentation] folder.

-For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one.
-
- On startup, priority majors will be scraped first (if required).
- Other majors will be scraped in arbitrary order (if required).
- Scrape timing will be stored in Redis.
- CRNs will be the Primary Key within SQLite
-  - If CRNs are duplicated between terms, then the primary key will be (CRN, Term)
-
-Considerations
-
- Change in metadata should decrease the interval
- The number of courses scraped should change the interval (2 hours per 500 courses involved)
-
-## Rate Limiting, Costs & Bursting
-
-Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur.
-Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently.
-For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user.
-Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap.
-
-The requirements to this hypothetical system would be:
-
- Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting.
- Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests.
+[documentation]: docs/README.md
+[src-bot]: src/bot
+[src-web]: src/web
+[src-scraper]: src/scraper
+[serenity]: https://github.com/serenity-rs/serenity
+[poise]: https://github.com/serenity-rs/poise
+[axum]: https://github.com/tokio-rs/axum
+[rust-embed]: https://lib.rs/crates/rust-embed
+[sqlx]: https://github.com/launchbadge/sqlx
@@ -0,0 +1,94 @@
+# Architecture
+
+## System Overview
+
+The Banner project is built as a multi-service application with the following components:
+
+- **Discord Bot Service**: Handles Discord interactions and commands
+- **Web Service**: Serves the React frontend and provides API endpoints
+- **Scraper Service**: Background data collection and synchronization
+- **Database Layer**: PostgreSQL for persistent storage
+
+## Technical Analysis
+
+### Banner System Integration
+
+Some of the features and architecture of Ellucian's Banner system are not clear.
+The following features, JSON, and more require validation & analysis:
+
+- Struct Nullability
+  - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null.
+  - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer).
+- Multiple Professors / Primary Indicator
+- Multiple Meeting Times
+- Meeting Schedule Types
+  - AFF vs AIN vs AHB etc.
+- Do CRNs repeat between years?
+- Check whether partOfTerm is always filled in, and it's meaning for various class results.
+- Check which API calls are affected by change in term/sessionID term select
+- SessionIDs
+  - How long does a session ID work?
+  - Do I really require a separate one per term?
+  - How many can I activate, are there any restrictions?
+  - How should session IDs be checked as 'invalid'?
+  - What action(s) keep a session ID 'active', if any?
+- Are there any courses with multiple meeting times?
+- Google Calendar link generation, as an alternative to ICS file generation
+
+## Change Identification
+
+- Important attributes of a class will be parsed on both the old and new data.
+- These attributes will be compared and given identifiers that can be subscribed to.
+- When a user subscribes to one of these identifiers, any changes identified will be sent to the user.
+
+## Real-time Suggestions
+
+Various commands arguments have the ability to have suggestions appear.
+
+- They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second.
+- They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use.
+- It cannot be complicated. I only have so much time to develop this.
+- It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally.
+
+For these reasons, I believe PostgreSQL to be the ideal place for this data to be stored.
+It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions.
+
+- Only required data about the class will be stored, along with the JSON-encoded string.
+  - For now, this would only be the CRN (and possibly the Term).
+  - Potentially, a binary encoding could be used for performance, but it is unlikely to be better.
+- Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur.
+  - Upon a safe close requested
+    - Must be done quickly (<8 seconds)
+  - Every 30 minutes, if any scraping ocurred.
+    - May cause locking of commands.
+
+## Scraping System
+
+In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped.
+Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others.
+This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant.
+
+For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one.
+
+- On startup, priority majors will be scraped first (if required).
+- Other majors will be scraped in arbitrary order (if required).
+- Scrape timing will be stored in database.
+- CRNs will be the Primary Key within database
+  - If CRNs are duplicated between terms, then the primary key will be (CRN, Term)
+
+Considerations
+
+- Change in metadata should decrease the interval
+- The number of courses scraped should change the interval (2 hours per 500 courses involved)
+
+## Rate Limiting, Costs & Bursting
+
+Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur.
+Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently.
+For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user.
+Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap.
+
+The requirements to this hypothetical system would be:
+
+- Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting.
+- Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests.
@@ -1,11 +1,17 @@
-# Sessions
+# Banner
+
+All notes on the internal workings of the Banner system by Ellucian.
+
+## Sessions

 All notes on the internal workings of Sessions in the Banner system.

 - Sessions are generated on demand with a random string of characters.
+  - The format `{5 random characters}{milliseconds since epoch}`
+  - Example: ``
 - Sessions are invalidated after 30 minutes, but may change.
  - This delay can be found in the original HTML returned, find `meta[name="maxInactiveInterval"]` and read the `content` attribute.
-  - This is read at runtime by the javascript on initialization.
+  - This is read at runtime (in the browser, by javascript) on initialization.
 - Multiple timers exist, one is for the Inactivity Timer.
  - A dialog will appear asking the user to continue their session.
  - If they click the button, the session will be extended via the keepAliveURL (see `meta[name="keepAliveURL"]`).
@@ -0,0 +1,58 @@
+# Features
+
+## Current Features
+
+### Discord Bot Commands
+
+- **search** - Search for courses with various filters (title, course code, keywords)
+- **terms** - List available terms or search for a specific term
+- **time** - Get meeting times for a specific course (CRN)
+- **ics** - Generate ICS calendar file for a course with holiday exclusions
+- **gcal** - Generate Google Calendar link for a course
+
+### Data Pipeline
+
+- Intelligent scraping system with priority queues
+- Rate limiting and burst handling
+- Background data synchronization
+
+## Feature Wishlist
+
+### Commands
+
+- ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in)
+- Classes Now (find classes happening)
+- Autocomplete
+  - Class Title
+  - Course Number
+  - Term/Part of Term
+  - Professor
+  - Attribute
+- Component Pagination
+- RateMyProfessor Integration (Linked/Embedded)
+- Smart term selection (i.e. Summer 2024 will be selected automatically when opened)
+- Rate Limiting (bursting with global/user limits)
+- DMs Integration (allow usage of the bot in DMs)
+- Class Change Notifications (get notified when details about a class change)
+- Multi-term Querying (currently the backend for searching is kinda weird)
+- Full Autocomplete for Every Search Option
+- Metrics, Log Query, Privileged Error Feedback
+- Search for Classes
+  - Major, Professor, Location, Name, Time of Day
+- Subscribe to Classes
+  - Availability (seat, pre-seat)
+  - Waitlist Movement
+  - Detail Changes (meta, time, location, seats, professor)
+    - `time` Start, End, Days of Week
+    - `seats` Any change in seat/waitlist data
+    - `meta`
+- Lookup via Course Reference Number (CRN)
+- Smart Time of Day Handling
+  - "2 PM" -> Start within 2:00 PM to 2:59 PM
+  - "2-3 PM" -> Start within 2:00 PM to 3:59 PM
+  - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM
+  - "after 2 PM" -> Start within 2:01 PM to 11:59 PM
+  - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM
+- Get By Section Command
+  - CS 4393 001 =>
+  - Will require SQL to be able to search for a class by its section number
@@ -0,0 +1,42 @@
+# Documentation
+
+This folder contains detailed documentation for the Banner project. This file acts as the index.
+
+## Files
+
+- [`FEATURES.md`](FEATURES.md) - Current features, implemented functionality, and future roadmap
+- [`BANNER.md`](BANNER.md) - General API documentation on the Banner system
+- [`ARCHITECTURE.md`](ARCHITECTURE.md) - Technical implementation details, system design, and analysis
+
+## Samples
+
+The `samples/` folder contains real Banner API response examples:
+
+- `search/` - Course search API responses with various filters
+  - [`searchResults.json`](samples/search/searchResults.json)
+  - [`searchResults_500.json`](samples/search/searchResults_500.json)
+  - [`searchResults_CS500.json`](samples/search/searchResults_CS500.json)
+  - [`searchResults_malware.json`](samples/search/searchResults_malware.json)
+- `meta/` - Metadata API responses (terms, subjects, instructors, etc.)
+  - [`get_attribute.json`](samples/meta/get_attribute.json)
+  - [`get_campus.json`](samples/meta/get_campus.json)
+  - [`get_instructionalMethod.json`](samples/meta/get_instructionalMethod.json)
+  - [`get_instructor.json`](samples/meta/get_instructor.json)
+  - [`get_partOfTerm.json`](samples/meta/get_partOfTerm.json)
+  - [`get_subject.json`](samples/meta/get_subject.json)
+  - [`getTerms.json`](samples/meta/getTerms.json)
+- `course/` - Course detail API responses (HTML and JSON)
+  - [`getFacultyMeetingTimes.json`](samples/course/getFacultyMeetingTimes.json)
+  - [`getClassDetails.html`](samples/course/getClassDetails.html)
+  - [`getCorequisites.html`](samples/course/getCorequisites.html)
+  - [`getCourseDescription.html`](samples/course/getCourseDescription.html)
+  - [`getEnrollmentInfo.html`](samples/course/getEnrollmentInfo.html)
+  - [`getFees.html`](samples/course/getFees.html)
+  - [`getLinkedSections.html`](samples/course/getLinkedSections.html)
+  - [`getRestrictions.html`](samples/course/getRestrictions.html)
+  - [`getSectionAttributes.html`](samples/course/getSectionAttributes.html)
+  - [`getSectionBookstoreDetails.html`](samples/course/getSectionBookstoreDetails.html)
+  - [`getSectionPrerequisites.html`](samples/course/getSectionPrerequisites.html)
+  - [`getXlistSections.html`](samples/course/getXlistSections.html)
+
+These samples are used for development, testing, and understanding the Banner API structure.