mirror of
https://github.com/Xevion/the-office.git
synced 2025-12-15 06:13:30 -06:00
update README with information on quote processing, basic installation/dependencies
This commit is contained in:
110
README.md
110
README.md
@@ -19,10 +19,120 @@ A Vue.js and Flask Web Application designed to provide a quick way to search for
|
||||
- Instant Search provided Algolia
|
||||
- Sleek, responsive design that is easy on the eyes
|
||||
|
||||
## Quote Data
|
||||
|
||||
### Credit
|
||||
|
||||
Credit to [officequotes.net/](https://www.officequotes.net/) for providing all quote data.
|
||||
|
||||
Credit to [imdb.com](https://www.imdb.com/title/tt0386676/) for episode descriptions.
|
||||
|
||||
### Processing
|
||||
|
||||
Quotes are scraped directly from the website as of this moment.
|
||||
|
||||
This repository will hold the current pre-processed raw quote data, but the application has the ability to fetch and parse
|
||||
HTML pages directly as needed.
|
||||
|
||||
```
|
||||
python server/cli.py fetch
|
||||
--season SEASON Fetches all episodes from a specific season.
|
||||
--episode EPISODE Fetches a specific episode. Requires SEASON to be specified.
|
||||
--all Fetches data for every episode from every season.
|
||||
--skip SEASON:EPISODE When specified, it will skip a given episode.
|
||||
```
|
||||
|
||||
The data has to be parsed, but due to high irregularity (at least too much for me to handle), the files will have to be
|
||||
inspected and manually processed.
|
||||
|
||||
```python server/cli.py preprocess
|
||||
--season SEASON Pre-processes all episodes from a specific season.
|
||||
--episode EPISODE Pre-processes a specific episode. Requires SEASON to be specified.
|
||||
--all Pre-processes all episodes from every season.
|
||||
--overwrite DANGER: Will overwrite files. May result in manually processed files to be lost forever.
|
||||
```
|
||||
|
||||
From then on, once all files have been pre-processed, you will have to begin the long, annoying process of editing them into my custom format.
|
||||
|
||||
These raw pre-processed files are located in `'./server/data/processed/`
|
||||
|
||||
Each section (barring the first) is pre-pended by a .hyphen.
|
||||
|
||||
```
|
||||
CharacterName: Text that character says.
|
||||
OtherCharacter: More text that other character says..
|
||||
-
|
||||
ThirdCharacter: Text that character says in a second scene/section.
|
||||
-!1
|
||||
Fourth Character With Spaces In Name: Text that fourth character says in a deleted scene.
|
||||
Fifth-Character: Which deleted scene? Deleted scene number one.
|
||||
```
|
||||
|
||||
Deleted scenes are marked by a initial exclamation mark, and then a number of digits marking which deleted scene they are a part of.
|
||||
|
||||
Please note that extra text like 'Deleted Scenes 3' might appear before a hyphen - this is expected and is helpful when deciding
|
||||
which scene goes with which Deleted Scene ID. If you don't know, do what I did - go look at the web page it's based on.
|
||||
Otherwise, I read the quotes and figure out based on context.
|
||||
|
||||
This concept is rather loose, slow, and dumb, it simply allows me to mark what deleted scenes go together while working
|
||||
with a incredibly inconsistent, human curated data format.
|
||||
|
||||
To ease text processing, I did come up with RegEx expressions for search and replacement:
|
||||
|
||||
```
|
||||
^([\w\s]+\-*[\w\s]*):\s+
|
||||
$1|
|
||||
```
|
||||
|
||||
From then on, the process becomes much simpler, 95% of the work needed to process quotes is already done.
|
||||
|
||||
Now that quotes are in a consistent (although custom) format, they need to be processed into individual episodes. In reality,
|
||||
they are just the JSON format of the previous stage.
|
||||
|
||||
```
|
||||
python server/cli.py process
|
||||
--season SEASON Processes all episodes from a specific season.
|
||||
--epsiode EPISODE Processes a specific episode. Requires SEASON to be specified.
|
||||
--all Processes all episodes from all seasons.
|
||||
```
|
||||
|
||||
Now that they're all in individual files, the final commands can be ran to compile them into one file, a static
|
||||
'database' or something. Technically, they could be kept scattered, but I decided to make it simpler with just 1 big file.
|
||||
|
||||
This also is where Algolia comes in.
|
||||
|
||||
```
|
||||
python server/cli.py build [algolia|final]
|
||||
```
|
||||
|
||||
Each command is ran with no special arguments (as of now), generating a `algolia.json` or `data.json` in the `./server/data/` folder.
|
||||
|
||||
This `data.json` file is loaded by the Flask server and the `algolia.json` can be uploaded to your primary index.
|
||||
|
||||
## Setup
|
||||
|
||||
This project was built on Python 3.7 and Node v12.18.3 / npm 6.14.6.
|
||||
|
||||
### Installation
|
||||
|
||||
To install all Node/NPM dependencies, run
|
||||
|
||||
```
|
||||
npm install
|
||||
```
|
||||
|
||||
To install Python's dependencies, run
|
||||
|
||||
```
|
||||
pip install -r ./requirements.txt
|
||||
```
|
||||
|
||||
I recommend that you use a virtualenv in order to keep dependencies separate from other projects, as I do.
|
||||
Personally, I use PyCharm Professional to maintain virtualenvs, just because it's easy to start, use, update and maintain
|
||||
them.
|
||||
|
||||
### Running
|
||||
|
||||
- Vue.js can be ran via `npm run serve`.
|
||||
- Run this in `./client/`.
|
||||
- Flask can be ran via `flask run`.
|
||||
|
||||
Reference in New Issue
Block a user