docs: setup proper documentation, organize & clean README

2026-01-31 02:23:34 -06:00 · 2025-09-13 15:27:32 -05:00
parent 94fb6b4190
commit 878cc5f773
7 changed files with 254 additions and 123 deletions
@@ -2,6 +2,4 @@
 /target
 /go/
 .cargo/config.toml
-
+src/scraper/README.md
 **/*.md
 !/README.md
@@ -1,21 +1,28 @@
 default_services := "bot,web,scraper"
 # Auto-reloading frontend server
 frontend:
    pnpm run -C web dev
-backend:
+# Production build of frontend
    cargo run --bin banner
 build-frontend:
    pnpm run -C web build
-build-backend:
+# Auto-reloading backend server
-    cargo build --release --bin banner
+backend services=default_services:
    bacon --headless run -- -- --services "{{services}}"
-build: build-frontend build-backend
+# Production build
-
+build:
 # Production build that embeds assets
 build-prod:
    pnpm run -C web build
    cargo build --release --bin banner
 # Run auto-reloading development build with release characteristics (frontend is embedded, non-auto-reloading)
 # This is useful for testing backend release-mode details.
 dev-build services=default_services: build-frontend
    bacon --headless run -- --profile dev-release -- --services "{{services}}" --tracing pretty
 # Auto-reloading development build for both frontend and backend
 # Will not notice if either the frontend/backend crashes, but will generally be resistant to stopping on their own.
 [parallel]
-dev: frontend backend
+dev services=default_services: frontend (backend services)
@@ -1,125 +1,51 @@
 # banner
-A discord bot for executing queries & searches on the Ellucian Banner instance hosting all of UTSA's class data.
+A complex multi-service system providing a Discord bot and browser-based interface to UTSA's course data.
-## Feature Wishlist
+## Services
- Commands
+The application consists of three modular services that can be run independently or together:
  - ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in)
  - Classes Now (find classes happening)
 - Autocomplete
  - Class Title
  - Course Number
  - Term/Part of Term
  - Professor
  - Attribute
 - Component Pagination
 - RateMyProfessor Integration (Linked/Embedded)
 - Smart term selection (i.e. Summer 2024 will be selected automatically when opened)
 - Rate Limiting (bursting with global/user limits)
 - DMs Integration (allow usage of the bot in DMs)
 - Class Change Notifications (get notified when details about a class change)
 - Multi-term Querying (currently the backend for searching is kinda weird)
 - Full Autocomplete for Every Search Option
 - Metrics, Log Query, Privileged Error Feedback
 - Search for Classes
  - Major, Professor, Location, Name, Time of Day
 - Subscribe to Classes
  - Availability (seat, pre-seat)
  - Waitlist Movement
  - Detail Changes (meta, time, location, seats, professor)
    - `time` Start, End, Days of Week
    - `seats` Any change in seat/waitlist data
    - `meta`
 - Lookup via Course Reference Number (CRN)
 - Smart Time of Day Handling
  - "2 PM" -> Start within 2:00 PM to 2:59 PM
  - "2-3 PM" -> Start within 2:00 PM to 3:59 PM
  - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM
  - "after 2 PM" -> Start within 2:01 PM to 11:59 PM
  - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM
 - Get By Section Command
  - CS 4393 001 =>
  - Will require SQL to be able to search for a class by its section number
-## Analysis Required
+- Discord Bot ([`bot`][src-bot])
-Some of the features and architecture of Ellucian's Banner system are not clear.
+  - Primary interface for course monitoring and data queries
-The follow features, JSON, and more require validation & analysis:
+  - Built with [Serenity][serenity] and [Poise][poise] frameworks for robust command handling
  - Uses slash commands with comprehensive error handling and logging
- Struct Nullability
+- Web Server ([`web`][src-web])
  - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null.
  - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer).
 - Multiple Professors / Primary Indicator
 - Multiple Meeting Times
 - Meeting Schedule Types
  - AFF vs AIN vs AHB etc.
 - Do CRNs repeat between years?
 - Check whether partOfTerm is always filled in, and it's meaning for various class results.
 - Check which API calls are affected by change in term/sessionID term select
 - SessionIDs
  - How long does a session ID work?
  - Do I really require a separate one per term?
  - How many can I activate, are there any restrictions?
  - How should session IDs be checked as 'invalid'?
  - What action(s) keep a session ID 'active', if any?
 - Are there any courses with multiple meeting times?
 - Google Calendar link generation, as an alternative to ICS file generation
-## Change Identification
+  - [Axum][axum]-based server with Vite/React-based frontend
  - [Embeds static assets][rust-embed] at compile time with E-Tags & Cache-Control headers
- Important attributes of a class will be parsed on both the old and new data.
+- Scraper ([`scraper`][src-scraper])
 - These attributes will be compared and given identifiers that can be subscribed to.
 - When a user subscribes to one of these identifiers, any changes identified will be sent to the user.
-## Real-time Suggestions
+  - Intelligent data collection system with priority-based queuing inside PostgreSQL via [`sqlx`][sqlx]
  - Rate-limited scraping with burst handling to respect UTSA's systems
  - Handles course data updates, availability changes, and metadata synchronization
-Various commands arguments have the ability to have suggestions appear.
+## Quick Start
- They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second.
+```bash
- They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use.
+pnpm install -C web  # Install frontend dependencies
- It cannot be complicated. I only have so much time to develop this.
+cargo build  # Build the backend
 - It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally.
-For these reasons, I believe SQLite to be the ideal place for this data to be stored.
+just dev # Runs auto-reloading dev build
-It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions.
+just dev bot,web # Runs auto-reloading dev build, running only the bot and web services
 just dev-build # Development build with release characteristics (frontend is embedded, non-auto-reloading)
- Only required data about the class will be stored, along with the JSON-encoded string.
+just build # Production build that embeds assets
-  - For now, this would only be the CRN (and possibly the Term).
+```
  - Potentially, a binary encoding could be used for performance, but it is unlikely to be better.
 - Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur.
  - Upon a safe close requested
    - Must be done quickly (<8 seconds)
  - Every 30 minutes, if any scraping ocurred.
    - May cause locking of commands.
-## Scraping
+## Documentation
-In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped.
+Comprehensive documentation is available in the [`docs/`][documentation] folder.
 Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others.
 This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant.
-For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one.
+[documentation]: docs/README.md
-
+[src-bot]: src/bot
- On startup, priority majors will be scraped first (if required).
+[src-web]: src/web
- Other majors will be scraped in arbitrary order (if required).
+[src-scraper]: src/scraper
- Scrape timing will be stored in Redis.
+[serenity]: https://github.com/serenity-rs/serenity
- CRNs will be the Primary Key within SQLite
+[poise]: https://github.com/serenity-rs/poise
-  - If CRNs are duplicated between terms, then the primary key will be (CRN, Term)
+[axum]: https://github.com/tokio-rs/axum
-
+[rust-embed]: https://lib.rs/crates/rust-embed
-Considerations
+[sqlx]: https://github.com/launchbadge/sqlx
 - Change in metadata should decrease the interval
 - The number of courses scraped should change the interval (2 hours per 500 courses involved)
 ## Rate Limiting, Costs & Bursting
 Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur.
 Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently.
 For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user.
 Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap.
 The requirements to this hypothetical system would be:
 - Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting.
 - Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests.
@@ -0,0 +1,94 @@
 # Architecture
 ## System Overview
 The Banner project is built as a multi-service application with the following components:
 - **Discord Bot Service**: Handles Discord interactions and commands
 - **Web Service**: Serves the React frontend and provides API endpoints
 - **Scraper Service**: Background data collection and synchronization
 - **Database Layer**: PostgreSQL for persistent storage
 ## Technical Analysis
 ### Banner System Integration
 Some of the features and architecture of Ellucian's Banner system are not clear.
 The following features, JSON, and more require validation & analysis:
 - Struct Nullability
  - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null.
  - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer).
 - Multiple Professors / Primary Indicator
 - Multiple Meeting Times
 - Meeting Schedule Types
  - AFF vs AIN vs AHB etc.
 - Do CRNs repeat between years?
 - Check whether partOfTerm is always filled in, and it's meaning for various class results.
 - Check which API calls are affected by change in term/sessionID term select
 - SessionIDs
  - How long does a session ID work?
  - Do I really require a separate one per term?
  - How many can I activate, are there any restrictions?
  - How should session IDs be checked as 'invalid'?
  - What action(s) keep a session ID 'active', if any?
 - Are there any courses with multiple meeting times?
 - Google Calendar link generation, as an alternative to ICS file generation
 ## Change Identification
 - Important attributes of a class will be parsed on both the old and new data.
 - These attributes will be compared and given identifiers that can be subscribed to.
 - When a user subscribes to one of these identifiers, any changes identified will be sent to the user.
 ## Real-time Suggestions
 Various commands arguments have the ability to have suggestions appear.
 - They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second.
 - They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use.
 - It cannot be complicated. I only have so much time to develop this.
 - It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally.
 For these reasons, I believe PostgreSQL to be the ideal place for this data to be stored.
 It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions.
 - Only required data about the class will be stored, along with the JSON-encoded string.
  - For now, this would only be the CRN (and possibly the Term).
  - Potentially, a binary encoding could be used for performance, but it is unlikely to be better.
 - Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur.
  - Upon a safe close requested
    - Must be done quickly (<8 seconds)
  - Every 30 minutes, if any scraping ocurred.
    - May cause locking of commands.
 ## Scraping System
 In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped.
 Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others.
 This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant.
 For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one.
 - On startup, priority majors will be scraped first (if required).
 - Other majors will be scraped in arbitrary order (if required).
 - Scrape timing will be stored in database.
 - CRNs will be the Primary Key within database
  - If CRNs are duplicated between terms, then the primary key will be (CRN, Term)
 Considerations
 - Change in metadata should decrease the interval
 - The number of courses scraped should change the interval (2 hours per 500 courses involved)
 ## Rate Limiting, Costs & Bursting
 Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur.
 Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently.
 For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user.
 Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap.
 The requirements to this hypothetical system would be:
 - Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting.
 - Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests.
@@ -1,11 +1,17 @@
-# Sessions
+# Banner
 All notes on the internal workings of the Banner system by Ellucian.
 ## Sessions
 All notes on the internal workings of Sessions in the Banner system.
 - Sessions are generated on demand with a random string of characters.
  - The format `{5 random characters}{milliseconds since epoch}`
  - Example: ``
 - Sessions are invalidated after 30 minutes, but may change.
  - This delay can be found in the original HTML returned, find `meta[name="maxInactiveInterval"]` and read the `content` attribute.
-  - This is read at runtime by the javascript on initialization.
+  - This is read at runtime (in the browser, by javascript) on initialization.
 - Multiple timers exist, one is for the Inactivity Timer.
  - A dialog will appear asking the user to continue their session.
  - If they click the button, the session will be extended via the keepAliveURL (see `meta[name="keepAliveURL"]`).
@@ -0,0 +1,58 @@
 # Features
 ## Current Features
 ### Discord Bot Commands
 - **search** - Search for courses with various filters (title, course code, keywords)
 - **terms** - List available terms or search for a specific term
 - **time** - Get meeting times for a specific course (CRN)
 - **ics** - Generate ICS calendar file for a course with holiday exclusions
 - **gcal** - Generate Google Calendar link for a course
 ### Data Pipeline
 - Intelligent scraping system with priority queues
 - Rate limiting and burst handling
 - Background data synchronization
 ## Feature Wishlist
 ### Commands
 - ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in)
 - Classes Now (find classes happening)
 - Autocomplete
  - Class Title
  - Course Number
  - Term/Part of Term
  - Professor
  - Attribute
 - Component Pagination
 - RateMyProfessor Integration (Linked/Embedded)
 - Smart term selection (i.e. Summer 2024 will be selected automatically when opened)
 - Rate Limiting (bursting with global/user limits)
 - DMs Integration (allow usage of the bot in DMs)
 - Class Change Notifications (get notified when details about a class change)
 - Multi-term Querying (currently the backend for searching is kinda weird)
 - Full Autocomplete for Every Search Option
 - Metrics, Log Query, Privileged Error Feedback
 - Search for Classes
  - Major, Professor, Location, Name, Time of Day
 - Subscribe to Classes
  - Availability (seat, pre-seat)
  - Waitlist Movement
  - Detail Changes (meta, time, location, seats, professor)
    - `time` Start, End, Days of Week
    - `seats` Any change in seat/waitlist data
    - `meta`
 - Lookup via Course Reference Number (CRN)
 - Smart Time of Day Handling
  - "2 PM" -> Start within 2:00 PM to 2:59 PM
  - "2-3 PM" -> Start within 2:00 PM to 3:59 PM
  - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM
  - "after 2 PM" -> Start within 2:01 PM to 11:59 PM
  - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM
 - Get By Section Command
  - CS 4393 001 =>
  - Will require SQL to be able to search for a class by its section number
@@ -0,0 +1,42 @@
 # Documentation
 This folder contains detailed documentation for the Banner project. This file acts as the index.
 ## Files
 - [`FEATURES.md`](FEATURES.md) - Current features, implemented functionality, and future roadmap
 - [`BANNER.md`](BANNER.md) - General API documentation on the Banner system
 - [`ARCHITECTURE.md`](ARCHITECTURE.md) - Technical implementation details, system design, and analysis
 ## Samples
 The `samples/` folder contains real Banner API response examples:
 - `search/` - Course search API responses with various filters
  - [`searchResults.json`](samples/search/searchResults.json)
  - [`searchResults_500.json`](samples/search/searchResults_500.json)
  - [`searchResults_CS500.json`](samples/search/searchResults_CS500.json)
  - [`searchResults_malware.json`](samples/search/searchResults_malware.json)
 - `meta/` - Metadata API responses (terms, subjects, instructors, etc.)
  - [`get_attribute.json`](samples/meta/get_attribute.json)
  - [`get_campus.json`](samples/meta/get_campus.json)
  - [`get_instructionalMethod.json`](samples/meta/get_instructionalMethod.json)
  - [`get_instructor.json`](samples/meta/get_instructor.json)
  - [`get_partOfTerm.json`](samples/meta/get_partOfTerm.json)
  - [`get_subject.json`](samples/meta/get_subject.json)
  - [`getTerms.json`](samples/meta/getTerms.json)
 - `course/` - Course detail API responses (HTML and JSON)
  - [`getFacultyMeetingTimes.json`](samples/course/getFacultyMeetingTimes.json)
  - [`getClassDetails.html`](samples/course/getClassDetails.html)
  - [`getCorequisites.html`](samples/course/getCorequisites.html)
  - [`getCourseDescription.html`](samples/course/getCourseDescription.html)
  - [`getEnrollmentInfo.html`](samples/course/getEnrollmentInfo.html)
  - [`getFees.html`](samples/course/getFees.html)
  - [`getLinkedSections.html`](samples/course/getLinkedSections.html)
  - [`getRestrictions.html`](samples/course/getRestrictions.html)
  - [`getSectionAttributes.html`](samples/course/getSectionAttributes.html)
  - [`getSectionBookstoreDetails.html`](samples/course/getSectionBookstoreDetails.html)
  - [`getSectionPrerequisites.html`](samples/course/getSectionPrerequisites.html)
  - [`getXlistSections.html`](samples/course/getXlistSections.html)
 These samples are used for development, testing, and understanding the Banner API structure.