mirror of
https://github.com/Xevion/the-office.git
synced 2025-12-06 01:16:34 -06:00
232 lines
8.5 KiB
Markdown
232 lines
8.5 KiB
Markdown
# the-office
|
||
|
||
A Vue.js and Flask Web Application designed to provide a quick way to search for quotes from NBC's "The Office".
|
||
|
||
## Screenshots
|
||
|
||

|
||

|
||
|
||
## Features
|
||
|
||
- Vue.js based, providing fast SPA functionality
|
||
- Setup via Vue CLI
|
||
- Vue Router
|
||
- Vue Bootstrap
|
||
- Axios
|
||
- Vue Instantsearch (Algolia)
|
||
- Backend API provided by Flask
|
||
- Instant Search provided Algolia
|
||
- Sleek, responsive design that is easy on the eyes
|
||
|
||
## Quote Data
|
||
|
||
### Credit
|
||
|
||
Credit to [officequotes.net](https://www.officequotes.net/) for providing all quote data.
|
||
|
||
Credit to [imdb.com](https://www.imdb.com/title/tt0386676/) for episode descriptions.
|
||
|
||
### Processing
|
||
|
||
Quotes are scraped directly from the website as of this moment.
|
||
|
||
This repository will hold the current pre-processed raw quote data, but the application has the ability to fetch and parse
|
||
HTML pages directly as needed.
|
||
|
||
```
|
||
python server/cli.py fetch
|
||
--season SEASON Fetches all episodes from a specific season.
|
||
--episode EPISODE Fetches a specific episode. Requires SEASON to be specified.
|
||
--all Fetches data for every episode from every season.
|
||
--skip SEASON:EPISODE When specified, it will skip a given episode.
|
||
```
|
||
|
||
The data has to be parsed, but due to high irregularity (at least too much for me to handle), the files will have to be
|
||
inspected and manually processed.
|
||
|
||
```python server/cli.py preprocess
|
||
-s --season SEASON Pre-processes all episodes from a specific season.
|
||
-e --episode EPISODE Pre-processes a specific episode. Requires SEASON to be specified.
|
||
-a --all Pre-processes all episodes from every season.
|
||
-o --overwrite DANGER: Will overwrite files. May result in manually processed files to be lost forever.
|
||
```
|
||
|
||
From then on, once all files have been pre-processed, you will have to begin the long, annoying process of editing them into my custom format.
|
||
|
||
These raw pre-processed files are located in `'./server/data/processed/`
|
||
|
||
Each section (barring the first) is pre-pended by a .hyphen.
|
||
|
||
```
|
||
CharacterName: Text that character says.
|
||
OtherCharacter: More text that other character says..
|
||
-
|
||
ThirdCharacter: Text that character says in a second scene/section.
|
||
-!1
|
||
Fourth Character With Spaces In Name: Text that fourth character says in a deleted scene.
|
||
Fifth-Character: Which deleted scene? Deleted scene number one.
|
||
```
|
||
|
||
Deleted scenes are marked by a initial exclamation mark, and then a number of digits marking which deleted scene they are a part of.
|
||
|
||
Please note that extra text like 'Deleted Scenes 3' might appear before a hyphen - this is expected and is helpful when deciding
|
||
which scene goes with which Deleted Scene ID. If you don't know, do what I did - go look at the web page it's based on.
|
||
Otherwise, I read the quotes and figure out based on context.
|
||
|
||
This concept is rather loose, slow, and dumb, it simply allows me to mark what deleted scenes go together while working
|
||
with a incredibly inconsistent, human curated data format.
|
||
|
||
To ease text processing, I did come up with RegEx expressions for search and replacement:
|
||
|
||
```
|
||
^(?!-)([\w\s’'0-9\-\.\/#]*):\s+
|
||
$1|
|
||
```
|
||
|
||
From then on, the process becomes much simpler, 95% of the work needed to process quotes is already done.
|
||
|
||
Now that quotes are in a consistent (although custom) format, they need to be processed into individual episodes. In reality,
|
||
they are just the JSON format of the previous stage.
|
||
|
||
```
|
||
python server/cli.py process
|
||
-s --season SEASON Processes all episodes from a specific season.
|
||
-e --epsiode EPISODE Processes a specific episode. Requires SEASON to be specified.
|
||
-a --all Processes all episodes from all seasons.
|
||
```
|
||
|
||
Now that they're all in individual files, the final commands can be ran to compile them into one file, a static
|
||
'database' or something. Technically, they could be kept scattered, but I decided to make it simpler with just 1 big file.
|
||
|
||
This also is where Algolia comes in.
|
||
|
||
```
|
||
python server/cli.py build [algolia|final]
|
||
```
|
||
|
||
Each command is ran with no special arguments (as of now), generating a `algolia.json` or `data.json` in the `./server/data/` folder.
|
||
|
||
This `data.json` file is loaded by the Flask server and the `algolia.json` can be uploaded to your primary index.
|
||
|
||
For every command mentioned, you can read all arguments with `--help`:
|
||
|
||
```
|
||
$ python cli.py preprocess --help
|
||
Usage: cli.py preprocess [OPTIONS]
|
||
|
||
Pre-processes raw HTML files into mangled custom quote data.
|
||
|
||
Custom quote data requires manual inspection and formatting, making it a
|
||
dangerous operation that may overwrite precious quote data.
|
||
|
||
Options:
|
||
-s, --season INTEGER Season to be fetched. Without --episode, will
|
||
download all episodes in a season.
|
||
|
||
-e, --episode INTEGER Specific episode to be fetched. Requires
|
||
--season to be specified.
|
||
|
||
--all Fetch all episodes, regardless of previous
|
||
specifications.
|
||
|
||
-o, --overwrite Overwrite if a file already exists.
|
||
-ss, --silent-skip Skip missing/existing files silently
|
||
-ssm, --silent-skip-missing Skip missing files silently
|
||
-sse, --silent-skip-existing Skip overwrite skips silently
|
||
--help Show this message and exit.
|
||
```
|
||
|
||
## Setup
|
||
|
||
This project was built on Python 3.7 and Node v12.18.3 / npm 6.14.6.
|
||
|
||
### Installation
|
||
|
||
To install all Node/NPM dependencies, run
|
||
|
||
```
|
||
npm install
|
||
```
|
||
|
||
To install Python's dependencies, run
|
||
|
||
```
|
||
pip install -r ./requirements.txt
|
||
```
|
||
|
||
I recommend that you use a virtualenv in order to keep dependencies separate from other projects, as I do.
|
||
Personally, I use PyCharm Professional to maintain virtualenvs, just because it's easy to start, use, update and maintain
|
||
them.
|
||
|
||
### Running
|
||
|
||
- Vue.js can be ran via `npm run serve`.
|
||
- Run this in `./client/`.
|
||
- Flask can be ran via `flask run`.
|
||
- Run this in `./server/`.
|
||
- Add `--host=0.0.0.0` to the end to allow connections from LAN.
|
||
|
||
Note: Readying this application for Production and wider-development is still in progress.
|
||
|
||
## Descriptions, Images, Summaries
|
||
|
||
While The Office is a large show, it unfortunately does not have any readily accessible short descriptions of characters.
|
||
|
||
Wikis provide long and comprehensive descriptions and some high quality images, but overall, it was rather difficult to
|
||
find stuff for characters introduced late in the show or with overall more minor impact (ex David Wallace, Robert California, Clark Green).
|
||
|
||
Episode and Character images will be clickable, opening a Lightbox of the full body image.
|
||
|
||
### Character Images
|
||
|
||
Character Images are made to have 2 different shots: 1 large full body image, and a cropped square of their face.
|
||
|
||
Additionally, all images will be paired with thumbnails of their faces, used for progress image rendering.
|
||
|
||
All images should be as high quality as possible, minimum 500px in each dimension for the full body shot. Cropped squares
|
||
should also look to be as large as possible.
|
||
|
||
Thumbnails are manually generated with ImageMagick v7: `magick input.webp -resize 128 -quality 75 input_thumb.webp`
|
||
|
||
### Episode Images
|
||
|
||
Episode Images should mirror Netflix's episode images if possible. Otherwise, each episode will have to be
|
||
painstakingly reviewed for what scene and moment(s) best represent the episode, then screenshotted.
|
||
|
||
Then, like the character images, a square crop will be made, and thumbnails will be generated just the same.
|
||
|
||
One single image will represent each Episode on the Season page, but multiple images will be available in a carousel to
|
||
be viewed.
|
||
|
||
## To-do
|
||
|
||
Small to-do list to complete.
|
||
|
||
- Font Awesome Icons
|
||
- SeasonList Chevron
|
||
- Quote Permalink
|
||
- Attempt Algolia Query Suggestions
|
||
- Redirect to SearchResults page on Enter press
|
||
- Process all quote data
|
||
- Site Meta Tags
|
||
- Better Mobile Season List
|
||
- Smaller, collapsible?
|
||
- Heroku Production Deployment
|
||
- Possible solution via Docker
|
||
- Axios 'Fetch' Error Handling
|
||
- Navigation Bar
|
||
- Navbar Logo (?)
|
||
- Overall Responsiveness Improvements
|
||
- Algolia Clickthrough Events
|
||
- Search Results Page
|
||
- Character List
|
||
- Quote Permalink
|
||
- Deleted Scenes Marker
|
||
- Possible 'Flashback' Scene Marker
|
||
- Season List Episode Modal Popover
|
||
- Preview Image, Description, Episode Stats
|
||
- Quote Likes Database
|
||
- Requires difficult implementation of Flask controlled Postgres database
|
||
- Requires funding, longterm free hosting not possible with database requirements
|