General Average Words Per Card

For fun, I downloaded some JSON card data and wrote a bit of Python code to find the average number of words per card in a bunch of cubes from this site. It's a metric I'm trying to minimize in my cube, within reason, so this interests me.

I looked at full oracle text, but also the text excluding reminder text in parentheses. The reminder text can be misleading in some cases, since the same keywords may or may not get reminder text, depending on the card's set and rarity. For example, Omeanspeaker vs Sage's Row Savant. But it's interesting to see that some cubes have a bigger difference between the two averages, which indicates more keyword abilities and/or wordier reminder text among those keyword abilities.

It would be fun to come up with a more sophisticated index to estimate the rules density of a cube, or its readability.

Anyway, here are the results for the cubes I ran through the program, with lists pulled from the web within the last couple of weeks or so.

wordspercard.png
 
Oops.




On an actually productive note, it's fascinating how tight the spread on average word count is for cubes that aren't explicitly 'efficient' with words. One has to wonder how much of this is due to overlap in cube staples and how much this is due to power level correlating with word count, or something else entirely.

What's interesting is that the percentage of text that is reminder text seems to vary wildly between cubes, ranging from 24.3% to 10.9%, with some statistically significant deviation to the low end, though sample size is small and not random, etc.

Word Count Analysis.PNG

Methodology-wise, I'm assuming you've ranked the cubes by ascending mean of the two values presented? That seems reasonable, but I would advocate for using the median word count instead of the mean word count to better assess the average drafter's experience. I'd bet a fair bit that any cube that uses planeswalkers will experience a drop in when looking at the mean instead of a median, for example. I'm also curious to find out how much word count correlates with mana cost, or color for that matter.

You've clearly made an effort to consciously address the number of words per card, so I'm wondering--how has this experiment actually worked out in practice? I've read in your thread that the intent is to make this cube accessible to kids as well as newer Magic players, but that only seems to exacerbate the issue of differing facility with card text. Using the median word count could make it possible to appeal to a wider range of ability levels if that is a trade-off you're interested in making.

Would you mind sharing your sources? My coding is abysmal, but I'd love to get a chance to figure some of these things out for myself instead of just theorycrafting here!

And, in case it isn't clear, thank you for sharing! I love this sort of data :rolleyes:
 
Kind of curious now how low we could go while still having something cool. Honestly, probably pretty damn low.

Vanilla and french vanilla beaters are plenty respectable. Part of why my word count's so high is all my red cards have the whole "blah blah exile it" text, but basic burn spells are 7 words. Counterspells can be 3 words.

You potentially lose some of what makes MtG good as you attempt to go mega low, but, looking at the Simplicity Cube (15.1 words), I feel like we could get under 10 if we were ok with something similar to a Core set. Personally, I like Core sets for this reason. Might throw something together when I have time, as I enjoy designing anyways.
 

Onderzeeboot

Ecstatic Orb
Methodology-wise, I'm assuming you've ranked the cubes by ascending mean of the two values presented? That seems reasonable, but I would advocate for using the median word count instead of the mean word count to better assess the average drafter's experience. I'd bet a fair bit that any cube that uses planeswalkers will experience a drop in when looking at the mean instead of a median, for example. I'm also curious to find out how much word count correlates with mana cost, or color for that matter.

This is a great idea. There's actually no reason you can't have both!

By the way, flip cards also greatly increase the word count. I run four of the original flip walkers. Liliana, Heretical Healer and Jace, Vryn's Prodigy are 82 and 89 words respectively (according to Word), Kytheon, Hero of Akros manages 100, and Nissa, Vastwood Seer racks up an impressive 109 words! (Again, according to Microsoft Word, which does consider loyalty costs of planeswalker abilities and mana costs for activated abilities a 'word'.)

Sagas also have the ability to be well above average. Elspeth Conquers Death runs an impressive 68 words for a non-DFC card, including the reminder text!

And then there's Gonti, Lord of Luxury, which manages a whopping 71 words without reminder text, without mana symbols, without any special layout that helps accomodate extra words. All hail Gonti, the Lord of Words!
 
This is pretty cool, and not surprising that I’d be near the top as far as word density goes. It’s a problem I’ve noticed in play, and I’ve made a conscious effort to simplify whenever a reasonably sufficient analogue is present. I wonder what my old version of the GCC clocks in at. Is it difficult to pull that info?

https://cubecobra.com/cube/list/5e4eb284adfb9642dec2cabc
 
I must be pretty low with my Casual Champions Cube. I don't run planeswalkers, don't run double faced cards, and I have a faible for ckean and simple designs anyway. Also, we shouldn't forget, that cards can have only a short, single ability and be still pretty skill intensive. On the other hand, some planeswalkers or pushed creatures (who would probably count as wordy) can be real no brainers once in play.
 
I found a bug and fixed the algorithm. Likely, other bugs exist, considering the wide variety of cards. Data table image has been updated. I also added the other cubes you asked about.

Zoss, here's an attempt to share the Python code. If it doesn't show up properly, you can private message me with an email address so you can receive the file directly if you're interested.


Code:
import json
 
 
def create_cube_json(filename):
    # Call this once to create a smaller JSON file from the full card list
    # Visit https://scryfall.com/docs/api/bulk-data and look for "Oracle Cards" file;
    # Download and rename it to 'oracle_cards.json'
    with open('oracle_cards.json', encoding='utf8') as f_oracle_cards:
        d = json.load(f_oracle_cards)
    f_cube_list = open(filename)
    cards = []
    for line in f_cube_list:
        cards += [line.strip()]
    output_filename = filename[:filename.find('.')]+'.json'
    f_cube_json = open(output_filename, 'w+', encoding='utf8')
    cube_data = []
    for card in d:
        if card['name'] in cards:
            for i in range(cards.count(card['name'])):
                cube_data += [card]
        elif 'card_faces' in card.keys():
            for face in card['card_faces']:
                if face['name'] in cards:
                    cube_data += [card]
    string_data = json.dumps(cube_data)
    f_cube_json.write(string_data)
    f_cube_json.close()
    f_oracle_cards.close()
    f_cube_list.close()
 
 
def wc_fo(text):
    # full oracle text word count
    return len(text.split())
 
 
def wc_o(text):
    # word count of oracle text without keyword explanation text
    if "(" in text and ")" in text:
        p1 = text.find("(")
        p2 = text.find(")")
        o = text[:p1] + text[p2+1:]
        return wc_o(o)
    else:
        return len(text.split())
 
 
def word_count(card, full=False):
    if full:
        wc = wc_fo
    else:
        wc = wc_o
    if 'card_faces' in card.keys():
        total_count = 0
        for face in card['card_faces']:
            text = face['oracle_text']
            total_count += wc(text)
        return total_count
    else:
        text = card['oracle_text']
        return wc(text)
 
 
def cube_word_count(filename, full=False):
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    wc = []
    for card in d:
        words = word_count(card, full)
        wc += [words]
    mean = sum(wc)/len(wc)
    if full:
        metric = 'Avg Words/Card (full text)'
    else:
        metric = 'Avg Words/Card (no reminders)'
    print(f"{metric}: {mean:.1f}")
    f.close()
    return mean
 
 
def duplicate_count(filename):
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    unique = []
    for card in d:
        if card['name'] not in unique:
            unique += [card['name']]
    print(f"{len(unique)} unique of {len(d)} total cards")
    f.close()
 
 
def print_wordy_cards(filename, n):
    # List cards with n or more words
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    for card in d:
        words = word_count(card)  # change to word_count(card, True) to count full oracle text
        if words >= n:
            print(f"{words} words: {card['name']}")
            # print(get_text(card) + "\n")
    f.close()
 
 
def get_text(card):
    if 'oracle_text' in card.keys():
        return card['oracle_text']
    elif 'card_faces' in card.keys():
        text = ""
        for face in card['card_faces']:
            text += face['oracle_text'] + ' '
        return text
 
 
cube_name = 'Simplicity'                    # Name must match the *.txt card list file
create_cube_json(cube_name + '.txt')            # Comment this out if cubename.json is already created
# print_wordy_cards(cube_name + '.json', 0)
cube_word_count(cube_name + '.json')        # Average words excluding reminder text
cube_word_count(cube_name + '.json', True)  # Average words in full oracle text

Here's a table of median values:
medianwordspercard.png

I'm just splitting the oracle text by spaces (string.split() function), which has some quirks to it. For example, a modal card like Supreme Will has a hyphen and two bullets, each of which are surrounded by spaces. Those each get counted as a word. Mana symbols and tap symbols are also considered words, so Llanowar Elves has three words, but that seems appropriate. Symbols need to be processed by the reader.

As to how this approach has worked for me: my cube has never been properly drafted by more than 3 people at a time. When I get a chance to assemble some people in person, I'll give it a try. What I do know is that in my first attempt to design a cube I completely disregarded the complexity level of the cube, and I think that made it less fun for the drafters.
 
I must be pretty low with my Casual Champions Cube. I don't run planeswalkers, don't run double faced cards, and I have a faible for ckean and simple designs anyway. Also, we shouldn't forget, that cards can have only a short, single ability and be still pretty skill intensive. On the other hand, some planeswalkers or pushed creatures (who would probably count as wordy) can be real no brainers once in play.

I added that to the list. Sorry for missing it the first time - I tried to include lots of cubes from the regulars on the forums. That cube is indeed on the lower side for average words per card.

I also added the old version of Graveyard Cube.
 
What I do know is that in my first attempt to design a cube I completely disregarded the complexity level of the cube, and I think that made it less fun for the drafters.
I just went through vanilla and evergreen-french vanilla creatures to explore a low word count and it kind of has that classic feel on first pass. All these creatures do is provide a body. Very old school vibe.
https://cubecobra.com/cube/list/5f94a367cff9250fce6d3313
Heeey, that's where I'd expect a cube with a very high gold component to sit :')
 

Onderzeeboot

Ecstatic Orb

Wow! You found gold cards with an extremely low word count after I found some monocolored cards with an extremely high word count. That must mean gold cards have less words on average than monocolored cards! :rolleyes:

I mean, my hypothesis that the opposite is true might very well be wrong, but I expect gold cards to be wordier on average than monocolored cards. I'll gladly be proven wrong by actual research instead of a potentially unrepresentative sample. :p
 
Wow! You found gold cards with an extremely low word count after I found some monocolored cards with an extremely high word count. That must mean gold cards have less words on average than monocolored cards! :rolleyes:

I mean, my hypothesis that the opposite is true might very well be wrong, but I expect gold cards to be wordier on average than monocolored cards. I'll gladly be proven wrong by actual research instead of a potentially unrepresentative sample. :p

Without researching, I'd guess multicolor cards are higher rarity on average and higher rarity cards are wordier on average, as well.
 

Kirblinx

Developer
Staff member
The point of my stack is to have the most ridiculous cards possible, I am surprised it isn't higher, since it isn't technically a normal cube.
I suppose there is a pile of cards that just say 'counter target spell' to help lower the word count.

This is all very interesting. Nice job.
 
  • Like
Reactions: dbs
I was curious how these numbers compare to a retail draft set, so I wrote some more code to run through the set files available from mtgjson to figure that out. I also looked at the differences between rarities. These numbers are for the full oracle text. A handful of sets are excluded because my code was failing for those sets (meld, adventure, and a few other unknown bugs)

Google Spreadsheet Link

If you look at recent expansion sets, here's idea of how many words per card you get:
17 (common)
24 (uncommon)
32 (rare)
40+ (mythic)

Overall, with a 10 : 3 : 1 ratio of common: uncommon : rare/mythic, a retail drafter sees about 21 words per card.
 
Overall, with a 10 : 3 : 1 ratio of common: uncommon : rare/mythic, a retail drafter sees about 21 words per card.

Seems like a bit over 20 is what the game kind of lends itself to.

I found it interesting that searching for "is:frenchvanilla" yielded cards that just had Unleash or something. Technically french vanilla, but a limited time keyword like that still requires reminder text to be read, whereas lifelink and flying won't. Probably not something we can quantify.
 
I think this project is cool, but I have no idea how to use this information :/

It's a bit interesting that Highball and The Elegant Cube* are both about equal in terms of words per card. Highball is 1 word higher in regards to median words per card excluding reminder text, but The Elegant Cube is 1 word higher in regards to median words per card including oracle text. I say this because I know Japahn, at least in the past, has taken metrics such as simplicity into account when choosing cards. Meanwhile, I don't generally take card wordiness into account when picking cards. Perhaps my average is being brought down by my abundance of simple Cantrips and Mana Elves which can have as few as one word of rules text depending on whether or not symbols are counted as words.

I'm also somewhat interested to see where my cube ends up after adding all of my IKO/M21/2XM/ZNR/CMR updates. While I plan on cutting many of the Wordy Enchantment Payoffs, I am adding a number of Complex Blokes to the list. I don't think the average goes down, but I wouldn't be surprised by a slight decrease, especially given the simple nature of some of the MDFCs I plan to test.


*Should cube titles be italicized? Are they like books in that regard?
 
With less experienced drafters I can say that anything over 4 lines (more than about 20-25 words) just doesn't get read in the first 4-5 picks each pack.

So yes, that means stuff that more experienced cubers might windmill slam will *almost* table if everyone but me is inexperienced with cube that night.

I think this is mostly useful as another way to think about complexity management, alongside the other tricks like breaking singleton or limiting the number of one-off keywords, etc.


I think you could play recognizable magic with an average rules text of eight. For every Savannah Lions, you can bring in a sixteen worder after all.
 
Wow, that's really really cool. Thanks for the analyses!

I think this project is cool, but I have no idea how to use this information :/

It's a bit interesting that Highball and The Elegant Cube* are both about equal in terms of words per card. Highball is 1 word higher in regards to median words per card excluding reminder text, but The Elegant Cube is 1 word higher in regards to median words per card including oracle text. I say this because I know Japahn, at least in the past, has taken metrics such as simplicity into account when choosing cards. Meanwhile, I don't generally take card wordiness into account when picking cards. Perhaps my average is being brought down by my abundance of simple Cantrips and Mana Elves which can have as few as one word of rules text depending on whether or not symbols are counted as words.

Part of it is that I've let complexity grow a bit lately because I felt I was sacrificing some excitement, and part of it because the majority of cards should come from Core, which is about 2 words lower (average) and 1 word lower (median) than the full cube, as I don't care about word count in Occasionals.

I think, for Core, 18 non-reminder text average is the sweet spot for my cube and regular playgroup. That's close to a retail draft, and allows for enough complexity to be deep.

On one hand, reminder text is read more in cubes than in retail, so the comparison is a little unfair, but I think it's neat to use cube to showcase all the breadth of mechanics that MtG has had.

I want to run the script over my iterations now and see how it evolved over time!
 
I've been thinking about words per card as an abstract metric, and I think that we can't really use it as a viable metric between cubes unless the difference is stark (e.g. intentionally simple cubes as compared with Kirblinx's stack). However, what I *do* think it's useful for could be as an internal measure--that is, to measure different subsections within a given cube against one another.

Specifically, I'd wager that the number of words on cards of certain colors would likely be vastly different (my gut says the Blue would have the most and White the least, but burn spells might change that) and that there may be more words on creatures than on spells in less-powerful cubes. I'll get around to doing an analysis myself . . . eventually.

Practically speaking, the use case would then be to determine if certain classes of cards are outliers within the cube or as an additional metric to help incentivize cuts/adds.
 
Yeah, comparing colors could be interesting for some cubes. And it isn't necessarily a bad thing to have an imbalance in that regard. I looked at that for my list and noticed some imbalance, but it was a little more interesting to me to notice a few cards that were a lot of words for what they do. Sea Gate Oracle was one that's pretty simple conceptually, but it takes a lot of words to get there.

I'm not advocating to anybody that they should try to reduce the word count in their cube. Like I said, it's for fun. Using data to answer a question is fun. I should have used that time to catch up on some work instead. Whoops!
 
Stats for the Elegant Cube (Core) by color:

Non-land colorless: average 16.90 [of 31 cards]
W: average 16.98 [of 53 cards]
U: average 18.59 [of 51 cards]
B: average 19.02 [of 55 cards]
R: average 20.62 [of 53 cards]
G: average 18.75 [of 52 cards]
WU: average 11.50 [of 2 cards]
WB: average 18.00 [of 2 cards]
WR: average 19.50 [of 2 cards]
WG: average 11.50 [of 2 cards]
UB: average 17.00 [of 2 cards]
UR: average 32.00 [of 2 cards]
UG: average 14.00 [of 3 cards]
BR: average 34.00 [of 2 cards]
BG: average 18.67 [of 3 cards]
RG: average 23.00 [of 2 cards]
Non-basic land: average 17.50 [of 36 cards]

Avg Words/Card (no reminders): 18.5
Avg Words/Card (full text): 23.8

Clearly, my white is too simple and my red is too complex.

I added this to Nemo's script, new full version:
Code:
import json
 
from collections import defaultdict
from pprint import pp
 
 
def create_cube_json(filename):
    # Call this once to create a smaller JSON file from the full card list
    # Visit https://scryfall.com/docs/api/bulk-data and look for "Oracle Cards" file;
    # Download and rename it to 'oracle_cards.json'
    with open('oracle_cards.json', encoding='utf8') as f_oracle_cards:
        d = json.load(f_oracle_cards)
    f_cube_list = open(filename)
    cards = []
    for line in f_cube_list:
        cards += [line.strip()]
    output_filename = filename[:filename.find('.')] + '.json'
    f_cube_json = open(output_filename, 'w+', encoding='utf8')
    cube_data = []
    for card in d:
        if card['name'] in cards:
            for i in range(cards.count(card['name'])):
                cube_data += [card]
        elif 'card_faces' in card.keys():
            for face in card['card_faces']:
                if face['name'] in cards:
                    cube_data += [card]
    string_data = json.dumps(cube_data)
    f_cube_json.write(string_data)
    f_cube_json.close()
    f_cube_list.close()
 
 
def wc_fo(text):
    # full oracle text word count
    return len(text.split())
 
 
def wc_o(text):
    # word count of oracle text without keyword explanation text
    if "(" in text and ")" in text:
        p1 = text.find("(")
        p2 = text.find(")")
        o = text[:p1] + text[p2 + 1:]
        return wc_o(o)
    else:
        return len(text.split())
 
 
def word_count(card, full=False):
    if full:
        wc = wc_fo
    else:
        wc = wc_o
    if 'card_faces' in card.keys():
        total_count = 0
        for face in card['card_faces']:
            text = face['oracle_text']
            total_count += wc(text)
        return total_count
    else:
        text = card['oracle_text']
        return wc(text)
 
 
def cube_word_count(filename, full=False):
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    wc = []
    for card in d:
        words = word_count(card, full)
        wc += [words]
    mean = sum(wc) / len(wc)
    if full:
        metric = 'Avg Words/Card (full text)'
    else:
        metric = 'Avg Words/Card (no reminders)'
    print(f"{metric}: {mean:.1f}")
    return mean
 
 
def duplicate_count(filename):
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    unique = []
    for card in d:
        if card['name'] not in unique:
            unique += [card['name']]
    print(f"{len(unique)} unique of {len(d)} total cards")
 
 
def print_wordy_cards(filename, n):
    # List cards with n or more words
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    for card in d:
        words = word_count(card)  # change to word_count(card, True) to count full oracle text
        if words >= n:
            print(f"{words} words: {card['name']}")
            # print(get_text(card) + "\n")
 
 
WHEEL = str.maketrans('WUBRG', '01234')
 
 
def wheel_order(colors):
    return colors.translate(WHEEL)
 
 
def print_by_color(filename):
    cards_by_color_identity = defaultdict(list)
    with open(filename, encoding='utf8') as f:
        d = json.load(f)
    for card in d:
        words = word_count(card)  # change to word_count(card, True) to count full oracle text
        if 'Land' in card['type_line']:
            identity = 'Non-basic land'
        else:
            identity = ''.join(sorted(card['color_identity'], key=lambda ci:wheel_order(ci)))
        cards_by_color_identity[identity].append((card['name'], words))
    # pp(cards_by_color_identity)
    # sort first by fewest colors in identity, then by WUBRG order
    for identity, card_tuples in sorted(cards_by_color_identity.items(), key=lambda t:(len(t[0]), wheel_order(t[0]))):
        total_words = sum(words for _, words in card_tuples)
        card_count = len(card_tuples)
        average_words = total_words / card_count
        identity_name = 'Non-land colorless' if not identity else identity
        print(f"{identity_name }: average {average_words:.2f} [of {card_count} cards]")
    print()
 
 
def get_text(card):
    if 'oracle_text' in card.keys():
        return card['oracle_text']
    elif 'card_faces' in card.keys():
        text = ""
        for face in card['card_faces']:
            text += face['oracle_text'] + ' '
        return text
 
 
cube_name = 'TheElegantCubeCore'  # Name must match the *.txt card list file
create_cube_json(cube_name + '.txt')  # Comment this out if cubename.json is already created
# print_wordy_cards(cube_name + '.json', 0)
print_by_color(cube_name + '.json')
cube_word_count(cube_name + '.json')  # Average words excluding reminder text
cube_word_count(cube_name + '.json', True)  # Average words in full oracle text

There is some weirdness with both sides of double-faced cards being counted and Pashalik Mons being considered double-faced...
 
If someone compiles the Cube Cobra links to the cubes you analyzed in previous posts, I can run the script for all those cubes.

Edit: or we can, you know, do the right thing and create a git repo...
 
Another thing to consider on this topic, although probably difficult to quantify, is usage of keywords. Specifically, something like Cacade or Encore has 5+ lines of reminder text. If you're only running 1-2 cards with those mechanics, that might as well be real text because people are usually going to need to read it.

Retail sets solve this by limiting the number of wordy, non-evergreen keywords to 1 per color pair or something similar, allowing drafters to quickly become familiar with the mechanic. This may be an approach worth taking in a beginner friendly environment or for anyone just looking to decrease their wordiness.
 
Top