Rename server folder to data, server\data to data\old-data

This commit is contained in:
Xevion
2022-05-21 09:19:45 -05:00
parent cfc224df21
commit 2c058f7840
1282 changed files with 69 additions and 17 deletions

128
data/api.py Normal file
View File

@@ -0,0 +1,128 @@
"""
api.py
Provides a accessible protected backend API. JSON I/O only, CSRF protected.
"""
import copy
import json
import os
from copy import deepcopy
# from flask_caching import cache
import flask_wtf
from flask import current_app, jsonify, request, send_from_directory
from server.helpers import default, get_neighbors
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
with open(os.path.join(BASE_DIR, 'data', 'data.json'), 'r', encoding='utf-8') as file:
data = json.load(file)
with open(os.path.join(BASE_DIR, 'data', 'characters.json'), 'r', encoding='utf-8') as file:
character_data = json.load(file)
# Cached preload character data
character_list = dict()
stats = {
'totals': {
'quote': 0,
'scene': 0,
'episode': 0,
'season': 0
}
}
stats['totals']['season'] += len(default(data, []))
for season in data:
stats['totals']['episode'] += len(default(season.get('episodes'), []))
for episode in season['episodes']:
stats['totals']['scene'] += len(default(episode.get('scenes'), []))
for scene in default(episode.get('scenes'), []):
stats['totals']['quote'] += len(default(scene.get('quotes'), []))
@current_app.route('/api/csrf/')
def api_csrf():
"""
Page used for refreshing expired CSRF tokens via AJAX.
Probably secure: https://medium.com/@iaincollins/csrf-tokens-via-ajax-a885c7305d4a
"""
return jsonify(flask_wtf.csrf.generate_csrf())
@current_app.route('/api/episode/<int:season>/<int:episode>/')
def api_episode(season: int, episode: int):
return jsonify(data[season - 1]['episodes'][episode - 1])
@current_app.route('/api/stats/')
def api_stats():
return jsonify(stats)
@current_app.route('/api/episodes/')
def api_episodes():
"""
Returns a list of episodes with basic information (no quotes).
Used for the left side season bar.
"""
seasons = []
copy = deepcopy(data)
for season in copy:
for episode in season.get('episodes'):
if 'scenes' in episode.keys():
del episode['scenes']
seasons.append(season)
return jsonify(seasons)
@current_app.route('/api/all/')
def api_data():
"""
Season data route
"""
return jsonify(data)
@current_app.route('/api/quote_surround')
def api_quote_neighbors():
season, episode = int(request.args.get('season')), int(request.args.get('episode'))
scene, quote = int(request.args.get('scene')), int(request.args.get('quote'))
quotes = data[season - 1]['episodes'][episode - 1]['scenes'][scene - 1]['quotes']
top, below = get_neighbors(quotes, quote - 1, int(request.args.get('distance', 2)))
return jsonify({'above': top, 'below': below})
@current_app.route('/api/characters/')
def api_character_list():
_data = copy.deepcopy(character_data)
for key in _data.keys():
del _data[key]['quotes']
return jsonify(_data)
@current_app.route('/api/character/<character>/')
def api_character_all(character: str):
_data = copy.deepcopy(character_data[character])
_data['quotes'] = _data['quotes'][:10]
return jsonify(_data)
@current_app.route('/api/character/<character>/quotes/')
def api_character_quotes(character: str):
quotes = character_data[character]['quotes']
# Compute pagination if argument is available. Static 10 results per page, one-indexed.
if 'page' in request.args.keys():
index: int = (int(request.args['page']) - 1) * 10
return jsonify(quotes[index: index + 10])
else:
return jsonify(quotes)
@current_app.route('/static/img/<path:filename>')
def custom_static(filename):
return send_from_directory('./data/img/', filename)

View File

@@ -0,0 +1,103 @@
1. `raw` Directory is almost never edited. It is preserved as a "source of truth" and is only edited when there are flaws in
the original pre-processing.
2. `normalization/truths` acts as a second layer of truth. It is the most basic XML processed file available.
a. All characters are extracted as 'Speakers', meaning 'Michael' and 'Andy & Dwight' are still valid speakers.
b. Speakers extracted are placed in `speaker_mapping.xml`. This allows misspellings and other such errors to be merged together.
- This step of the process has explicit and direct impact on script data. For example, while we do want
"Bob Vance Refrigeration Worker #1" and "Bob Vance Refrigeration Worker #2" to show up internally as the same, we do want them to textually
show differently on the script page.
- Thus, at this stage, we do not merge names of background workers; we only correct mispellings.
c. After this, speakers are translated into a 'identification' file to give short, web-friendly slug identifiers.
- For example, "Michael" becomes `michael`, and "Bob Vance" becomes `bob-vance`.
- Characters will acquire IDs that are most familiar and easy; "Phyllis", while her full name is "Phyllis Vance", will get
`phyllis`.
- This step of the process is entirely for internal data referencing. From before, the bob vance refrigeration workers
will all map to the same `bob-vance-refrigeration-worker` internally.
- Additionally, compound speakers (those that do not directly reference their speaker) like `Kevin's Computer` or
`Kevin and Oscar`, or `Dwight, Kelly, Andy and Pam` will be broken up, and hopefully, be properly
annotated.
```xml
<IdentifierList>
<Speaker annotated="false">
<RawText>Phyllis</RawText>
<Character>phyllis</Character>
</Speaker>
<Speaker annotated="true">
<RawText>Kevin's Computer</RawText>
<AnnotatedText>{Kevin}'s Computer</AnnotatedText>
<Characters>
<Character>kevin</Character>
</Characters>
</Speaker>
</IdentifierList>
```
- `<Characters>` elements will be used only for compound speakers. Warnings should show in console in the next step
when compound speakers are not annotated, or if a `Characters` tag is used while only containing one element.
If `AnnotatedText` appears in a `Speaker` element's children but `annotated` is false, or
3. `normalization/characters` acts as the character data layer. Here, characters will have their metadata assigned, like whether or not
they are a main, recurring, background or meta character.
a. Michael, Dwight and Jim are **main** characters. This can be defined by having a very large number of quotes, continued and prolonged
presence in the show, so and so forth.
b. David Wallace, Bob Vance and Esther are **recurring** characters. While they may play a hefty role in the show, they don't appear enough
to make it in as a "main character".
c. Captain Jack, Pizza Guy and Bob Vance Refrigeration Worker are **background** characters. These are characters that appear only once
or make so little impact that it damages the meaning of being a *recurring* character if they were included. The line between a
*background* character and a *recurring* character may be pretty thin at times, so I anticipate some characters will be difficult to choose.
d. "Everyone" and "None" are **meta** characters (the speakers active won't be searchable, but the quote text will be, as usual).
This type is reserved for lines that don't really have a character or for more abstract things, or for scene descriptions.
4. `normalization/compiled` is the final stage when all data is *compiled* into one singular dataset.
a. `episodes/{season}-{episode}.xml` contains each episode's data.
```xml
<SceneList>
<Scene>
<Quote>
<Speaker>
<SpeakerText annotated="true">{Michael}</SpeakerText>
<Characters>
<Character type="main">michael</Character>
</Characters>
</Speaker>
<QuoteText>
People say I am the best boss. They go, "God we've never worked in a place like this before. You're hilarious."
"And you get the best out of us." [shows the camera his WORLD'S BEST BOSS mug] I think that pretty much sums it up.
I found it at Spencer Gifts.
</QuoteText>
</Quote>
</Scene>
<Scene>
<Quote deleted="true" deletedScene="13">
<Speaker>
<SpeakerText annotated="true">{Dwight} and {Andy}</SpeakerText>
<Characters>
<Character type="main">dwight</Character>
<Character type="main">andy</Character>
</Characters>
</Speaker>
<QuoteText>
[singing] Shall I play for you? Pa rum pump um pum [Imitates heavy drumming] I have no gifts for you.
Pa rum pump um pum [Imitates heavy drumming]
</QuoteText>
</Quote>
</Scene>
</SceneList>
```

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

View File

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More