Bug fixes and features (#1).

Reviewed-on: witten/novel-stats#1
This commit is contained in:
Dan Helfman 2021-10-22 20:05:17 +00:00
commit 57724e65df
8 changed files with 244 additions and 62 deletions

106
README.md
View File

@ -1,6 +1,6 @@
novel-stats produces word count statistics for novels written in Markdown
format, including total word count, per-chapter word counts, per-act word
counts, and counts by chapter "status." You might find this useful if you're
format, including total word count, word count by status, and optionally
per-chapter and per-act word counts. You might find this useful if you're
already using tools like Git and Markdown processing as part of your writing
workflow (or are looking to start) and want some basic statistics about your
novel as you're writing it.
@ -9,24 +9,52 @@ novel-stats is fairly particular about the format of the novel and doesn't
currently include much in the way of error checking. Word counts may not be
exact.
Example output:
Example output with no optional data:
```bash
$ novel-stats example.md
chapter 1: 103 words (drafted)
chapter 2: 83 words (dev edited)
chapter 3: 115 words
chapter 4: 96 words
chapter 5: 136 words (drafted)
act 1: 187 words (~34%)
act 2: 212 words (~39%)
act 3: 137 words (~25%)
drafted: 239 words (~44%)
dev edited: 83 words (~15%)
drafted: 237 words (~43%)
dev edited: 82 words (~15%)
total: 539 words
```
Example output with chapter data:
```bash
$ novel-stats example.md -c
chapter 1: 103 (drafted)
chapter 2: 83 (dev edited)
chapter 3: 115
chapter 4: 96
chapter 5: 136 (drafted)
drafted: 237 words (~43%)
dev edited: 82 words (~15%)
total: 539 words
```
Example with multi-file markdown:
```bash
$ novel-stats multi_file.mdpp -pp -c -a
chapter 1 Lorem:
203 (drafted)
303 (dev edited)
506 words (total)
chapter 2 Ipsum: 84 (dev edited)
chapter 3 Dolor: 116
chapter 4 Sit: 97
chapter 5 Amet: 137 (drafted)
act 1: 591 words (~62%)
act 2: 214 words (~22%)
act 3: 138 words (~14%)
drafted: 336 words (~35%)
dev edited: 385 words (~40%)
total: 946 words
```
## Installation
Start by cloning the project with git. Then install it with Python's `pip`.
@ -43,16 +71,24 @@ easier):
pip3 install --editable /path/to/novel-stats
```
## Usage
novel-stats takes a single argument: The path to your novel file in markdown
format. For instance:
```bash
novel-stats /path/to/your/novel.md
novel-stats /path/to/your/novel.md[pp] [-c/--chapter] [-a/--act] [-pp]
```
### Optional flags
* -c or --chapter — output chapter-by-chapter breakdown of word counts,
including how many words in each chapter are tagged with which status
* -a or --act — output act-by-act breakdown of word counts (total only)
* --pp — run markdown pre-processor, this allows for a multi-file input
(e.g. each chapter in its own file), but requires the MarkdownPP python
library.
## Markdown format
You'll need to format your novel in the expected format for novel-stats to
@ -126,11 +162,43 @@ If you do use this feature, you should set the status at the top of each
chapter, before the actual chapter contents (and after any chapter status).
### Comments
Comments, such as outlining notes for yourself, can be added anywhere using:
```yaml
[//]: # This text is completely ignored.
```
These words will not count towards the word count
### Multi-file support
Splitting your novel into multiple files is supported using the `MarkdownPP`
python library. To include a secondary file inside the main one, simply use
```yaml
!INCLUDE "OtherFile.md"
```
and add the `-pp` flag to novel-stats.
### Example novel
novel-stats includes an example Markdown file `example.md` that illustrates
the expected Markdown format. Try it out:
novel-stats includes two examples:
1. Markdown file `example.md` that illustrates the expected Markdown format
for a single file. Try it out:
```bash
$ novel-stats example.md
```
novel-stats example.md
2. A 6 file example in the `example` folder with the main file
`multi_file.mdpp`. You can try this one out with
```bash
$ cd example
$ novel-stats multi_file.mdpp -pp
```

17
example/Chapter1.md Normal file
View File

@ -0,0 +1,17 @@
## 1 Lorem
[status]: # (drafted)
[act]: # (1)
*Lorem* ipsum dolor sit amet, consectetur adipiscing elit. Ut cursus malesuada leo. Phasellus justo orci, auctor ac maximus vitae, aliquet ornare urna. Etiam porttitor tristique ligula, et dictum mauris consequat vel. Curabitur fringilla velit posuere, imperdiet mauris auctor, varius nibh. Vestibulum sed mauris maximus, vehicula leo sit amet, sodales enim. Maecenas tempor nibh nec egestas aliquam. Proin non nibh eget tellus porttitor pharetra. Phasellus hendrerit, nunc quis lobortis finibus, lacus massa lobortis justo, sit amet vulputate urna magna sit amet dui. Ut facilisis sem orci, sit amet dignissim ligula rutrum quis. Nulla iaculis urna eget varius pellentesque. Nulla pulvinar orci sollicitudin consequat volutpat. Nullam tempus lectus sed est lacinia, et blandit odio tempor. In in quam luctus, convallis sem nec, dapibus elit. Nunc ornare, neque sodales maximus faucibus, lectus velit tincidunt elit, eu blandit nulla turpis sit amet ex. Curabitur ullamcorper mi non quam pharetra, eget cursus sem dapibus.
**Nullam** ac elementum arcu, eu congue orci. Sed blandit quam non vulputate porta. Donec laoreet metus sit amet ex feugiat, in scelerisque est varius. Curabitur nec elit vel ante consequat gravida. ***Aliquam* ultrices** dolor vel eros hendrerit condimentum. Donec efficitur turpis quis eros viverra venenatis. Praesent ultricies dolor nec justo consectetur consectetur.
[status]: # (dev edited)
***Maecenas* nec** mi sapien. Vestibulum tortor tortor, feugiat in est nec, vestibulum faucibus magna. Pellentesque elementum elit sed metus ornare lobortis. Nunc molestie, justo id ultricies elementum, nibh libero suscipit massa, feugiat pharetra felis mi ut lacus. Pellentesque ornare pretium mi, in commodo nulla dignissim vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Vivamus sed dolor ut mi mattis sagittis vitae ac dolor. Integer tincidunt diam sapien, vitae tincidunt neque semper sit amet. Cras mi risus, faucibus et lacinia et, eleifend sed nunc. Sed faucibus consectetur justo, non accumsan orci imperdiet quis.
Sed sed porta ante. Sed viverra dui sit amet eros rutrum volutpat. Aliquam eu nulla congue, cursus lectus sit amet, congue tellus. Maecenas id aliquam libero. Maecenas ultrices blandit aliquam. Sed pretium ut ipsum eu pharetra. Nullam at nunc vitae erat luctus varius non auctor felis. Ut vel dignissim nibh, sit amet gravida mauris. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Curabitur id lobortis erat. Etiam facilisis turpis ut libero cursus, sit amet cursus diam hendrerit. Sed nisi lacus, semper quis ante ac, dictum ullamcorper lacus. Aenean vel gravida mauris, a egestas nisi.
Quisque fermentum sagittis mi. Aliquam erat volutpat. Sed vehicula quam non nunc porta sagittis. Phasellus eu dignissim arcu, non volutpat quam. Vestibulum aliquam leo eget justo pulvinar placerat. Interdum et malesuada fames ac ante ipsum primis in faucibus. Sed sed lacus tempus, tempus dui sed, lobortis magna. Vivamus dui eros, eleifend id ipsum eget, tincidunt porta metus. Pellentesque efficitur pharetra arcu nec auctor. Pellentesque euismod tincidunt risus, vel blandit leo iaculis et. Sed pellentesque lectus nisi, at faucibus purus laoreet in. Curabitur rhoncus lobortis blandit. Nulla et imperdiet risus, eu facilisis arcu. Ut tincidunt justo in eros vehicula feugiat.

16
example/Chapter2.md Normal file
View File

@ -0,0 +1,16 @@
## 2 Ipsum
[status]: # (dev edited)
Nullam id cursus velit, et lobortis est. Sed consequat diam risus, ac
hendrerit mauris facilisis vitae. Vestibulum blandit enim nibh, ut vehicula
augue hendrerit sit amet. Proin gravida elit quis erat dapibus ornare. Etiam
suscipit eget tortor eget facilisis. Phasellus finibus nunc quis urna
ultricies elementum. Quisque faucibus pharetra augue eu consectetur. Proin
vehicula, nisl ac maximus volutpat, turpis orci imperdiet quam, ac tempor erat
lectus in leo. Nullam et efficitur ipsum. Nulla felis turpis, blandit ultrices
eros venenatis, sagittis convallis lectus.
[//]: # (Testing out a comment)

17
example/Chapter3.md Normal file
View File

@ -0,0 +1,17 @@
## 3 Dolor
[act]: # (2)
Mauris eu orci at velit scelerisque feugiat nec tristique nunc. Curabitur vel
dolor imperdiet, iaculis nunc sit amet, volutpat sapien. Phasellus enim ipsum,
varius a sollicitudin a, dapibus id magna. Etiam vitae sollicitudin orci.
Vivamus dapibus lacinia risus eu pellentesque. Lorem ipsum dolor sit amet,
consectetur adipiscing elit. Donec ut nisl non mi suscipit scelerisque.
Curabitur quis accumsan velit, ac convallis lectus. Curabitur aliquet nisi et
magna tincidunt, in euismod orci rutrum. Aliquam sed erat eget ipsum
sollicitudin mollis. Donec accumsan euismod rhoncus. Proin molestie ut mauris
quis egestas. Sed cursus varius leo at suscipit. Aenean ultricies sodales mi,
non varius ex laoreet quis. Morbi a nisl fringilla lorem mollis consequat et
id ex.

13
example/Chapter4.md Normal file
View File

@ -0,0 +1,13 @@
## 4 Sit
Sed eget metus tristique, tincidunt purus non, euismod massa. Class aptent
taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Vestibulum egestas scelerisque neque. Pellentesque et ultrices lorem, at
mattis enim. Nam sit amet quam sapien. Maecenas fringilla nisl sit amet ipsum
feugiat condimentum. Praesent ac justo placerat, ornare lectus quis, volutpat
turpis. Etiam eu blandit nibh. Sed tincidunt facilisis massa vitae mattis.
Pellentesque vitae lectus et massa sollicitudin varius. Proin varius libero eu
elit mollis egestas. Ut interdum lacus tempor velit ullamcorper, quis
consequat tortor lobortis. Donec pulvinar pretium quam eu fringilla.

19
example/Chapter5.md Normal file
View File

@ -0,0 +1,19 @@
## 5 Amet
[status]: # (drafted)
[act]: # (3)
Cras eget egestas enim. Donec faucibus lacus malesuada magna bibendum, eget
molestie purus gravida. Vivamus leo erat, dapibus non tristique a, fringilla
eget felis. Phasellus efficitur, nibh eu sollicitudin tristique, urna tellus
ultricies ligula, sit amet facilisis libero risus sodales dolor. Quisque nec
tortor a ligula porttitor egestas et vel dui. Integer lorem sem, luctus vel
enim ac, rhoncus vestibulum urna. Maecenas eu sem id eros interdum congue.
Nunc quis turpis id nibh aliquam varius eget eget tortor. Morbi faucibus nisi
sit amet arcu sollicitudin, sit amet luctus lorem pulvinar. Aliquam velit
nulla, viverra a turpis eget, venenatis hendrerit sapien. Aliquam a sem
vehicula, tempor purus non, fringilla felis. Ut venenatis massa lacus, et
malesuada leo vehicula vitae. In vel nunc id metus semper ornare. Duis quis
tellus eleifend, tristique ex sit amet, mattis ligula.

8
example/multi_file.mdpp Normal file
View File

@ -0,0 +1,8 @@
# Title of the Novel
### Author Name
!INCLUDE "Chapter1.md"
!INCLUDE "Chapter2.md"
!INCLUDE "Chapter3.md"
!INCLUDE "Chapter4.md"
!INCLUDE "Chapter5.md"

View File

@ -2,14 +2,13 @@
import collections
import os
import string
import sys
CHAPTER_MARKER = '## '
STATUS_MARKER = '[status]: # '
ACT_MARKER = '[act]: # '
COMMENT_MARKER = '[//]: # ' # Strandard markdown comment marker, supported by pandoc and calibre's ebook-convert
def count_words(line):
@ -27,71 +26,96 @@ def count_words(line):
def main():
arguments = sys.argv[1:]
filename = arguments[0]
chapter_number = None
act_number = None
mdfile = None
if '-pp' in arguments:
# -pp flag to allow Markdown Preprocessing primarily to allow multi-file novel formatting
# this is implemented using a temporary file created using python's buit-in tempfile library
import MarkdownPP, tempfile
mdfile = tempfile.TemporaryFile(mode='w+')
MarkdownPP.MarkdownPP(input=open(filename), output=mdfile, modules=list(MarkdownPP.modules))
mdfile.seek(0)
else:
mdfile = open(filename)
chapter_heading = None
act_heading = None
total_word_count = 0
word_count_by_chapter = collections.defaultdict(int)
word_count_by_status = collections.defaultdict(int)
word_count_by_act = collections.defaultdict(int)
status_by_chapter = {}
current_status = None
for line in open(filename).readlines():
for line in mdfile.readlines():
if line.startswith(CHAPTER_MARKER):
word_count_by_act[act_number] += word_count_by_chapter[chapter_number]
total_word_count += word_count_by_chapter[chapter_number]
if chapter_number in status_by_chapter:
word_count_by_status[status_by_chapter[chapter_number]] += 1
word_count_by_act[act_heading] += word_count_by_chapter[chapter_heading]
total_word_count += word_count_by_chapter[chapter_heading]
chapter_number = int(line[len(CHAPTER_MARKER):])
chapter_heading = line[len(CHAPTER_MARKER):].strip('()\n')
word_count_by_chapter[chapter_number] = 1 # Start at one, because the chapter number itself counts as a word.
if chapter_number in status_by_chapter:
word_count_by_status[chapter_status] += 1
elif line.startswith(STATUS_MARKER):
status_by_chapter[chapter_number] = line[len(STATUS_MARKER):].strip('()\n')
word_count_by_chapter[chapter_heading] = count_words(chapter_heading) # Count the words in chapter heading, because the chapter number and title count as words.
status_by_chapter[chapter_heading] = collections.defaultdict(int)
current_status = None
elif line.startswith(STATUS_MARKER): # Modified to allow multiple statuses in a single chapter, can swap back and forth.
if current_status == None:
current_status = line[len(STATUS_MARKER):].strip('()\n')
status_by_chapter[chapter_heading][current_status] = count_words(chapter_heading)
else:
current_status = line[len(STATUS_MARKER):].strip('()\n')
status_by_chapter[chapter_heading][current_status] += 0
elif line.startswith(ACT_MARKER):
act_number = int(line[len(ACT_MARKER):].strip('()\n'))
word_count_by_act[act_number] = 1
act_heading = line[len(ACT_MARKER):].strip('()\n')
word_count_by_act[act_heading] = count_words(act_heading)
elif line.startswith(COMMENT_MARKER): # don't count the words in a comment
pass
else:
line_word_count = count_words(line)
word_count_by_chapter[chapter_number] += line_word_count
word_count_by_chapter[chapter_heading] += line_word_count
if chapter_number in status_by_chapter:
word_count_by_status[status_by_chapter[chapter_number]] += line_word_count
if current_status:
word_count_by_status[current_status] += line_word_count
status_by_chapter[chapter_heading][current_status] += line_word_count
mdfile.close()
# Do some final accounting after the last chapter.
word_count_by_act[act_number] += word_count_by_chapter[chapter_number]
total_word_count += word_count_by_chapter[chapter_number]
if chapter_number in status_by_chapter:
word_count_by_status[status_by_chapter[chapter_number]] += 1
word_count_by_act[act_heading] += word_count_by_chapter[chapter_heading]
total_word_count += word_count_by_chapter[chapter_heading]
# Print out word counts.
for chapter_number, chapter_word_count in word_count_by_chapter.items():
if chapter_number is None:
continue
if '-c' in arguments or '--chapter' in arguments: # -c or --chapter to give a chapter-by-chapter word count summary
for chapter_heading, chapter_word_count in word_count_by_chapter.items():
if chapter_heading is None:
continue
chapter_status = status_by_chapter.get(chapter_number)
if len(status_by_chapter[chapter_heading]) > 1:
print(f'chapter {chapter_heading}:')
print(
'chapter {}: {:,} words{}'.format(
chapter_number,
chapter_word_count,
' ({})'.format(chapter_status) if chapter_status else '',
)
)
for chapter_status, status_count in status_by_chapter[chapter_heading].items():
print(f'\t {status_count:,} ({chapter_status})')
print(f'\t {chapter_word_count:,} words (total)')
elif len(status_by_chapter[chapter_heading]) == 1:
chapter_status = list(status_by_chapter[chapter_heading].keys())[0]
print(f'chapter {chapter_heading}: {chapter_word_count:,} ({chapter_status})')
else:
print(f'chapter {chapter_heading}: {chapter_word_count:,}')
print()
print()
for act_number, act_word_count in word_count_by_act.items():
if act_number is None:
continue
if '-a' in arguments or '--act' in arguments: # -a or --act to give an act-by-act word count summary
for act_heading, act_word_count in word_count_by_act.items():
if act_heading is None:
continue
print('act {}: {:,} words (~{}%)'.format(act_number, act_word_count, act_word_count * 100 // total_word_count))
print('act {}: {:,} words (~{}%)'.format(act_heading, act_word_count, act_word_count * 100 // total_word_count))
print()
for status, status_word_count in word_count_by_status.items():
print('{}: {:,} words (~{}%)'.format(status, status_word_count, status_word_count * 100 // total_word_count))
print(f'{status}: {status_word_count:,} words (~{status_word_count * 100 // total_word_count}%)')
print('total: {:,} words'.format(total_word_count))
print(f'total: {total_word_count:,} words')
if __name__ == '__main__':