From 878cc5f773943564991a81b932ec98acecbe9027 Mon Sep 17 00:00:00 2001 From: Xevion Date: Sat, 13 Sep 2025 15:27:32 -0500 Subject: [PATCH] docs: setup proper documentation, organize & clean README --- .gitignore | 4 +- Justfile | 27 +++--- README.md | 142 ++++++++------------------------ docs/ARCHITECTURE.md | 94 +++++++++++++++++++++ docs/{Sessions.md => BANNER.md} | 10 ++- docs/FEATURES.md | 58 +++++++++++++ docs/README.md | 42 ++++++++++ 7 files changed, 254 insertions(+), 123 deletions(-) create mode 100644 docs/ARCHITECTURE.md rename docs/{Sessions.md => BANNER.md} (85%) create mode 100644 docs/FEATURES.md create mode 100644 docs/README.md diff --git a/.gitignore b/.gitignore index a79e4aa..07b554b 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,4 @@ /target /go/ .cargo/config.toml - -**/*.md -!/README.md \ No newline at end of file +src/scraper/README.md \ No newline at end of file diff --git a/Justfile b/Justfile index f681735..0962cb1 100644 --- a/Justfile +++ b/Justfile @@ -1,21 +1,28 @@ +default_services := "bot,web,scraper" + +# Auto-reloading frontend server frontend: pnpm run -C web dev -backend: - cargo run --bin banner - +# Production build of frontend build-frontend: pnpm run -C web build -build-backend: - cargo build --release --bin banner +# Auto-reloading backend server +backend services=default_services: + bacon --headless run -- -- --services "{{services}}" -build: build-frontend build-backend - -# Production build that embeds assets -build-prod: +# Production build +build: pnpm run -C web build cargo build --release --bin banner +# Run auto-reloading development build with release characteristics (frontend is embedded, non-auto-reloading) +# This is useful for testing backend release-mode details. +dev-build services=default_services: build-frontend + bacon --headless run -- --profile dev-release -- --services "{{services}}" --tracing pretty + +# Auto-reloading development build for both frontend and backend +# Will not notice if either the frontend/backend crashes, but will generally be resistant to stopping on their own. [parallel] -dev: frontend backend \ No newline at end of file +dev services=default_services: frontend (backend services) \ No newline at end of file diff --git a/README.md b/README.md index fb79b1c..ab2ee22 100644 --- a/README.md +++ b/README.md @@ -1,125 +1,51 @@ # banner -A discord bot for executing queries & searches on the Ellucian Banner instance hosting all of UTSA's class data. +A complex multi-service system providing a Discord bot and browser-based interface to UTSA's course data. -## Feature Wishlist +## Services -- Commands - - ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in) - - Classes Now (find classes happening) -- Autocomplete - - Class Title - - Course Number - - Term/Part of Term - - Professor - - Attribute -- Component Pagination -- RateMyProfessor Integration (Linked/Embedded) -- Smart term selection (i.e. Summer 2024 will be selected automatically when opened) -- Rate Limiting (bursting with global/user limits) -- DMs Integration (allow usage of the bot in DMs) -- Class Change Notifications (get notified when details about a class change) -- Multi-term Querying (currently the backend for searching is kinda weird) -- Full Autocomplete for Every Search Option -- Metrics, Log Query, Privileged Error Feedback -- Search for Classes - - Major, Professor, Location, Name, Time of Day -- Subscribe to Classes - - Availability (seat, pre-seat) - - Waitlist Movement - - Detail Changes (meta, time, location, seats, professor) - - `time` Start, End, Days of Week - - `seats` Any change in seat/waitlist data - - `meta` -- Lookup via Course Reference Number (CRN) -- Smart Time of Day Handling - - "2 PM" -> Start within 2:00 PM to 2:59 PM - - "2-3 PM" -> Start within 2:00 PM to 3:59 PM - - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM - - "after 2 PM" -> Start within 2:01 PM to 11:59 PM - - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM -- Get By Section Command - - CS 4393 001 => - - Will require SQL to be able to search for a class by its section number +The application consists of three modular services that can be run independently or together: -## Analysis Required +- Discord Bot ([`bot`][src-bot]) -Some of the features and architecture of Ellucian's Banner system are not clear. -The follow features, JSON, and more require validation & analysis: + - Primary interface for course monitoring and data queries + - Built with [Serenity][serenity] and [Poise][poise] frameworks for robust command handling + - Uses slash commands with comprehensive error handling and logging -- Struct Nullability - - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null. - - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer). -- Multiple Professors / Primary Indicator -- Multiple Meeting Times -- Meeting Schedule Types - - AFF vs AIN vs AHB etc. -- Do CRNs repeat between years? -- Check whether partOfTerm is always filled in, and it's meaning for various class results. -- Check which API calls are affected by change in term/sessionID term select -- SessionIDs - - How long does a session ID work? - - Do I really require a separate one per term? - - How many can I activate, are there any restrictions? - - How should session IDs be checked as 'invalid'? - - What action(s) keep a session ID 'active', if any? -- Are there any courses with multiple meeting times? -- Google Calendar link generation, as an alternative to ICS file generation +- Web Server ([`web`][src-web]) -## Change Identification + - [Axum][axum]-based server with Vite/React-based frontend + - [Embeds static assets][rust-embed] at compile time with E-Tags & Cache-Control headers -- Important attributes of a class will be parsed on both the old and new data. -- These attributes will be compared and given identifiers that can be subscribed to. -- When a user subscribes to one of these identifiers, any changes identified will be sent to the user. +- Scraper ([`scraper`][src-scraper]) -## Real-time Suggestions + - Intelligent data collection system with priority-based queuing inside PostgreSQL via [`sqlx`][sqlx] + - Rate-limited scraping with burst handling to respect UTSA's systems + - Handles course data updates, availability changes, and metadata synchronization -Various commands arguments have the ability to have suggestions appear. +## Quick Start -- They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second. -- They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use. -- It cannot be complicated. I only have so much time to develop this. -- It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally. +```bash +pnpm install -C web # Install frontend dependencies +cargo build # Build the backend -For these reasons, I believe SQLite to be the ideal place for this data to be stored. -It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions. +just dev # Runs auto-reloading dev build +just dev bot,web # Runs auto-reloading dev build, running only the bot and web services +just dev-build # Development build with release characteristics (frontend is embedded, non-auto-reloading) -- Only required data about the class will be stored, along with the JSON-encoded string. - - For now, this would only be the CRN (and possibly the Term). - - Potentially, a binary encoding could be used for performance, but it is unlikely to be better. -- Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur. - - Upon a safe close requested - - Must be done quickly (<8 seconds) - - Every 30 minutes, if any scraping ocurred. - - May cause locking of commands. +just build # Production build that embeds assets +``` -## Scraping +## Documentation -In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped. -Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others. -This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant. +Comprehensive documentation is available in the [`docs/`][documentation] folder. -For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one. - -- On startup, priority majors will be scraped first (if required). -- Other majors will be scraped in arbitrary order (if required). -- Scrape timing will be stored in Redis. -- CRNs will be the Primary Key within SQLite - - If CRNs are duplicated between terms, then the primary key will be (CRN, Term) - -Considerations - -- Change in metadata should decrease the interval -- The number of courses scraped should change the interval (2 hours per 500 courses involved) - -## Rate Limiting, Costs & Bursting - -Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur. -Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently. -For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user. -Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap. - -The requirements to this hypothetical system would be: - -- Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting. -- Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests. +[documentation]: docs/README.md +[src-bot]: src/bot +[src-web]: src/web +[src-scraper]: src/scraper +[serenity]: https://github.com/serenity-rs/serenity +[poise]: https://github.com/serenity-rs/poise +[axum]: https://github.com/tokio-rs/axum +[rust-embed]: https://lib.rs/crates/rust-embed +[sqlx]: https://github.com/launchbadge/sqlx diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..a633de4 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,94 @@ +# Architecture + +## System Overview + +The Banner project is built as a multi-service application with the following components: + +- **Discord Bot Service**: Handles Discord interactions and commands +- **Web Service**: Serves the React frontend and provides API endpoints +- **Scraper Service**: Background data collection and synchronization +- **Database Layer**: PostgreSQL for persistent storage + +## Technical Analysis + +### Banner System Integration + +Some of the features and architecture of Ellucian's Banner system are not clear. +The following features, JSON, and more require validation & analysis: + +- Struct Nullability + - Much of the responses provided by Ellucian contain nulls, and most of them are uncertain as to when and why they're null. + - Analysis must be conducted to be sure of when to use a string and when it should nillable (pointer). +- Multiple Professors / Primary Indicator +- Multiple Meeting Times +- Meeting Schedule Types + - AFF vs AIN vs AHB etc. +- Do CRNs repeat between years? +- Check whether partOfTerm is always filled in, and it's meaning for various class results. +- Check which API calls are affected by change in term/sessionID term select +- SessionIDs + - How long does a session ID work? + - Do I really require a separate one per term? + - How many can I activate, are there any restrictions? + - How should session IDs be checked as 'invalid'? + - What action(s) keep a session ID 'active', if any? +- Are there any courses with multiple meeting times? +- Google Calendar link generation, as an alternative to ICS file generation + +## Change Identification + +- Important attributes of a class will be parsed on both the old and new data. +- These attributes will be compared and given identifiers that can be subscribed to. +- When a user subscribes to one of these identifiers, any changes identified will be sent to the user. + +## Real-time Suggestions + +Various commands arguments have the ability to have suggestions appear. + +- They must be fast. As ephemeral suggestions that are only relevant for seconds or less, they need to be delivered in less than a second. +- They need to be easy to acquire. With as many commands & arguments to search as I do, it is paramount that the API be easy to understand & use. +- It cannot be complicated. I only have so much time to develop this. +- It does not need to be persistent. Since the data is scraped and rolled periodically from the Banner system, the data used will be deleted and re-requested occasionally. + +For these reasons, I believe PostgreSQL to be the ideal place for this data to be stored. +It is exceptionally fast, works well in-memory, and is less complicated compared to most other solutions. + +- Only required data about the class will be stored, along with the JSON-encoded string. + - For now, this would only be the CRN (and possibly the Term). + - Potentially, a binary encoding could be used for performance, but it is unlikely to be better. +- Database dumping into R2 would be good to ensure that over-scraping of the Banner system does not occur. + - Upon a safe close requested + - Must be done quickly (<8 seconds) + - Every 30 minutes, if any scraping ocurred. + - May cause locking of commands. + +## Scraping System + +In order to keep the in-memory database of the bot up-to-date with the Banner system, the API must be scraped. +Scraping will be separated by major to allow for priority majors (namely, Computer Science) to be scraped more often compared to others. +This will lower the overall load on the Banner system while ensuring that data presented by the app is still relevant. + +For now, all majors will be scraped fully every 4 hours with at least 5 minutes between each one. + +- On startup, priority majors will be scraped first (if required). +- Other majors will be scraped in arbitrary order (if required). +- Scrape timing will be stored in database. +- CRNs will be the Primary Key within database + - If CRNs are duplicated between terms, then the primary key will be (CRN, Term) + +Considerations + +- Change in metadata should decrease the interval +- The number of courses scraped should change the interval (2 hours per 500 courses involved) + +## Rate Limiting, Costs & Bursting + +Ideally, this application would implement dynamic rate limiting to ensure overload on the server does not occur. +Better, it would also ensure that priority requests (commands) are dispatched faster than background processes (scraping), while making sure different requests are weighted differently. +For example, a recent scrape of 350 classes should be weighted 5x more than a search for 8 classes by a user. +Still, even if the cap does not normally allow for this request to be processed immediately, the small user search should proceed with a small bursting cap. + +The requirements to this hypothetical system would be: + +- Conditional Bursting: background processes or other requests deemed "low priority" are not allowed to use bursting. +- Arbitrary Costs: rate limiting is considered in the form of the request size/speed more or less, such that small simple requests can be made more frequently, unlike large requests. diff --git a/docs/Sessions.md b/docs/BANNER.md similarity index 85% rename from docs/Sessions.md rename to docs/BANNER.md index 0d08a7f..7d8509c 100644 --- a/docs/Sessions.md +++ b/docs/BANNER.md @@ -1,11 +1,17 @@ -# Sessions +# Banner + +All notes on the internal workings of the Banner system by Ellucian. + +## Sessions All notes on the internal workings of Sessions in the Banner system. - Sessions are generated on demand with a random string of characters. + - The format `{5 random characters}{milliseconds since epoch}` + - Example: `` - Sessions are invalidated after 30 minutes, but may change. - This delay can be found in the original HTML returned, find `meta[name="maxInactiveInterval"]` and read the `content` attribute. - - This is read at runtime by the javascript on initialization. + - This is read at runtime (in the browser, by javascript) on initialization. - Multiple timers exist, one is for the Inactivity Timer. - A dialog will appear asking the user to continue their session. - If they click the button, the session will be extended via the keepAliveURL (see `meta[name="keepAliveURL"]`). diff --git a/docs/FEATURES.md b/docs/FEATURES.md new file mode 100644 index 0000000..0dcb325 --- /dev/null +++ b/docs/FEATURES.md @@ -0,0 +1,58 @@ +# Features + +## Current Features + +### Discord Bot Commands + +- **search** - Search for courses with various filters (title, course code, keywords) +- **terms** - List available terms or search for a specific term +- **time** - Get meeting times for a specific course (CRN) +- **ics** - Generate ICS calendar file for a course with holiday exclusions +- **gcal** - Generate Google Calendar link for a course + +### Data Pipeline + +- Intelligent scraping system with priority queues +- Rate limiting and burst handling +- Background data synchronization + +## Feature Wishlist + +### Commands + +- ICS Download (get a ICS download of your classes with location & timing perfectly - set for every class you're in) +- Classes Now (find classes happening) +- Autocomplete + - Class Title + - Course Number + - Term/Part of Term + - Professor + - Attribute +- Component Pagination +- RateMyProfessor Integration (Linked/Embedded) +- Smart term selection (i.e. Summer 2024 will be selected automatically when opened) +- Rate Limiting (bursting with global/user limits) +- DMs Integration (allow usage of the bot in DMs) +- Class Change Notifications (get notified when details about a class change) +- Multi-term Querying (currently the backend for searching is kinda weird) +- Full Autocomplete for Every Search Option +- Metrics, Log Query, Privileged Error Feedback +- Search for Classes + - Major, Professor, Location, Name, Time of Day +- Subscribe to Classes + - Availability (seat, pre-seat) + - Waitlist Movement + - Detail Changes (meta, time, location, seats, professor) + - `time` Start, End, Days of Week + - `seats` Any change in seat/waitlist data + - `meta` +- Lookup via Course Reference Number (CRN) +- Smart Time of Day Handling + - "2 PM" -> Start within 2:00 PM to 2:59 PM + - "2-3 PM" -> Start within 2:00 PM to 3:59 PM + - "ends by 2 PM" -> Ends within 12:00 AM to 2:00 PM + - "after 2 PM" -> Start within 2:01 PM to 11:59 PM + - "before 2 PM" -> Ends within 12:00 AM to 1:59 PM +- Get By Section Command + - CS 4393 001 => + - Will require SQL to be able to search for a class by its section number diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..3a6ef11 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,42 @@ +# Documentation + +This folder contains detailed documentation for the Banner project. This file acts as the index. + +## Files + +- [`FEATURES.md`](FEATURES.md) - Current features, implemented functionality, and future roadmap +- [`BANNER.md`](BANNER.md) - General API documentation on the Banner system +- [`ARCHITECTURE.md`](ARCHITECTURE.md) - Technical implementation details, system design, and analysis + +## Samples + +The `samples/` folder contains real Banner API response examples: + +- `search/` - Course search API responses with various filters + - [`searchResults.json`](samples/search/searchResults.json) + - [`searchResults_500.json`](samples/search/searchResults_500.json) + - [`searchResults_CS500.json`](samples/search/searchResults_CS500.json) + - [`searchResults_malware.json`](samples/search/searchResults_malware.json) +- `meta/` - Metadata API responses (terms, subjects, instructors, etc.) + - [`get_attribute.json`](samples/meta/get_attribute.json) + - [`get_campus.json`](samples/meta/get_campus.json) + - [`get_instructionalMethod.json`](samples/meta/get_instructionalMethod.json) + - [`get_instructor.json`](samples/meta/get_instructor.json) + - [`get_partOfTerm.json`](samples/meta/get_partOfTerm.json) + - [`get_subject.json`](samples/meta/get_subject.json) + - [`getTerms.json`](samples/meta/getTerms.json) +- `course/` - Course detail API responses (HTML and JSON) + - [`getFacultyMeetingTimes.json`](samples/course/getFacultyMeetingTimes.json) + - [`getClassDetails.html`](samples/course/getClassDetails.html) + - [`getCorequisites.html`](samples/course/getCorequisites.html) + - [`getCourseDescription.html`](samples/course/getCourseDescription.html) + - [`getEnrollmentInfo.html`](samples/course/getEnrollmentInfo.html) + - [`getFees.html`](samples/course/getFees.html) + - [`getLinkedSections.html`](samples/course/getLinkedSections.html) + - [`getRestrictions.html`](samples/course/getRestrictions.html) + - [`getSectionAttributes.html`](samples/course/getSectionAttributes.html) + - [`getSectionBookstoreDetails.html`](samples/course/getSectionBookstoreDetails.html) + - [`getSectionPrerequisites.html`](samples/course/getSectionPrerequisites.html) + - [`getXlistSections.html`](samples/course/getXlistSections.html) + +These samples are used for development, testing, and understanding the Banner API structure.