← return to the Luca home page

Tallying

If Luca knows how to parse your bank statements, then all you have to do is write a quick YAML tree of rules, and Luca will categorize your transactions and tally all of your income and expenses! The currently supported statement formats are:

If your bank statements are not yet supported, please let us know the format so that we can add a module to the luca.importer sub-package.

Quick reference to rules

2013 — Match a particular year.
9/16 — Match a month and day.
9/16-10/5 — Inclusive range of dates.
/regex/ — Description contains a regular expression.
~/regex/ — Description does not contain a regular expression.

Any other string is considered a category. You might start out writing rules by putting categories at the bottom of your YAML hierarchy:

- /AERLING/:
  - Travel.Airfare
- /DELTA AIR/:
  - Travel.Airfare
- /DJANGOCON/:
  - Travel.Conferences
- /PYCON/:
  - Travel.Conferences

But feel free to pivot category names up into the middle of the hierarchy instead. When a transaction matches a leaf node successfully, it receives the nearest ancestor category:

- Travel.Airfare:
  - /AERLING/
  - /DELTA AIR/
- Travel.Conferences:
  - /DJANGOCON/
  - /PYCON/

Be sure to include a colon : after a rule or category that is followed by further rules or categories. If you forget, then YAML will not be kind to you:

# Whoops! The terminal colon is missing.

- Travel.Airfare
  - /DELTA AIR/

# YAML parses this as simply a category named:

- "Travel.Airfare - /DELTA AIR/"

A transaction is tested against the rules in the YAML file starting at the top and proceeding downward. Only by matching a parent rule can a transaction proceed to be tested against its children. Once a transaction survives all the way out to a leaf node — a node that the transaction matches, that has no further children — then a match has been made, a category is assigned, and no further rules are processed for that transaction.

Invoking tally

If you supply PDF files to Luca, be sure that you have the pdftotext command installed on your system (Ubuntu keeps it in the poppler-utils package) so that Luca can turn them into plain text itself:

$ luca tally rules.yaml statements/*.pdf

Otherwise, you can render the PDFs to plain text manually before passing the text files to Luca:

$ pdftotext -layout new.pdf > statements/2013-03-checking.txt
$ luca tally rules.yaml statements/*.txt

If you are going to version control your bank statements alongside your rules files, then you might find the text versions more convenient anyway. In the last section of this document you will find an example Makefile if you want to automate this process.

Two sample statements

The following sections walk you through building a sample rules file. To try out the examples yourself, simply download these two sample bank statements:

These examples have been pared down to the minimum necessary text for Luca to recognize and parse this particular bank’s checking and Visa statements.

Starting your rules file

You will generally begin a rules file with a root selector, to narrow Luca’s attention to the specific transactions for which that file is designed. To keep things simple I maintain a separate rules file for each year; it would be difficult to keep a single rules file working against all previous years while writing new rules for the current year. We might start a 2013 rules file by typing this:

# rules.yaml

- 2013:
  - Unknown

This selects all transactions for the calendar year 2013 and gives them the category Unknown so that we can take a look at them and start making decisions about how to divide them up. Running a rules file against a set of bank statements is easy:

$ luca tally -t rules.yaml statements/*.txt

Always use the -t option while working on your rules: it asks Luca to display, beneath each category, its full list of matching transactions so that you can see whether your transactions are winding up in the right categories. Its output when run with these initial rules is quite simple:

368.20  Accounts
         32.00- Accounts.DCCU Business Visa
          0.06  Accounts.ID 0001 SAVINGS
        400.14  Accounts.ID 0070 CHECKING
368.20- Unknown

 2013-09-11 Foreign Transaction Fee -     2.00-
            Date 09/10/13
 2013-09-11 RESTAURANT TORONTO CA        55.00-
 2013-09-12 VIA RAIL MONTREAL CA        100.00-
 2013-09-13 DELTA TORONTO EAST           90.00-
 2013-09-13 PUB IN TORONTO CA            45.00-
 2013-09-14 DETROIT WINDSOR TUNNEL        5.00-
 2013-10-01 Payment From Share 0070     300.00 
 2013-10-01 Withdrawal To Loan          300.00-
 2013-10-10 SOUL CAFE BUADPEST HU        35.00-
 2013-10-15 Deposit at ATM #123       1,200.00 
 2013-10-30 Withdrawal PAYCHECK - To    500.00-
            SMITH, SANDRA A
            XXXXXXXXXX
 2013-10-31 Deposit Dividend 0.100%       0.14 
 2013-10-31 Deposit Dividend 0.150%       0.06 

Success — we can now see the transactions that need categorization! It is time to start creating specific rules.

Designing rules

The first step is usually to isolate transactions that do not represent real income or expenses, but that instead just represent money moving between our own accounts. I create a category named Zero for such transactions since transactions that move money should sum to exactly zero.

In this case, a single $300 payment on 10/1 is showing up twice: as both a deduction from our checking account and also a payment toward our Visa credit card account. To remove these from the Unknown category, we simply need to craft a pair of regular expressions to match:

# rules.yaml

- 2013:
  - Zero:
    - /Withdrawal To Loan/
    - /Payment From Share 0070/
  - Unknown
368.20  Accounts
         32.00- Accounts.DCCU Business Visa
          0.06  Accounts.ID 0001 SAVINGS
        400.14  Accounts.ID 0070 CHECKING
368.20- Unknown

 2013-09-11 Foreign Transaction Fee -     2.00-
            Date 09/10/13
 2013-09-11 RESTAURANT TORONTO CA        55.00-
 2013-09-12 VIA RAIL MONTREAL CA        100.00-
 2013-09-13 DELTA TORONTO EAST           90.00-
 2013-09-13 PUB IN TORONTO CA            45.00-
 2013-09-14 DETROIT WINDSOR TUNNEL        5.00-
 2013-10-10 SOUL CAFE BUADPEST HU        35.00-
 2013-10-15 Deposit at ATM #123       1,200.00 
 2013-10-30 Withdrawal PAYCHECK - To    500.00-
            SMITH, SANDRA A
            XXXXXXXXXX
 2013-10-31 Deposit Dividend 0.100%       0.14 
 2013-10-31 Deposit Dividend 0.150%       0.06 

  0.00  Zero

 2013-10-01 Payment From Share 0070     300.00 
 2013-10-01 Withdrawal To Loan          300.00-

Since accounts are alphabetized in the output, our new Zero account appears at the bottom. It successfully balances to zero! We have taken what was really a single transaction reported from two different points of view, and correctly tallied that moving the money from one account to the other made us neither richer nor poorer.

What should we tackle next? The sources of income stand out rather obviously in green, since they are the only remaining positive amounts.

The income is of two very different kinds — one is a client check that has resulted from a sale, while the other two are interest payments from the credit union — so we will create an Income account with two separate sub-accounts named Interest and Sales. Luca understands the idea of a hierarchy of accounts, if we separate parent from child accounts with a period:

# rules.yaml

- 2013:
  - Zero:
    - /Withdrawal To Loan/
    - /Payment From Share 0070/

  - Income.Interest:
    - /Deposit Dividend/
  - Income.Sales:
    - /Deposit at ATM/

  - Unknown
  368.20  Accounts
             32.00- Accounts.DCCU Business Visa
              0.06  Accounts.ID 0001 SAVINGS
            400.14  Accounts.ID 0070 CHECKING
1,200.20- Income
              0.20- Income.Interest

 2013-10-31 Deposit Dividend 0.100%       0.14 
 2013-10-31 Deposit Dividend 0.150%       0.06 

          1,200.00- Income.Sales

 2013-10-15 Deposit at ATM #123       1,200.00 

  832.00  Unknown

 2013-09-11 Foreign Transaction Fee -     2.00-
            Date 09/10/13
 2013-09-11 RESTAURANT TORONTO CA        55.00-
 2013-09-12 VIA RAIL MONTREAL CA        100.00-
 2013-09-13 DELTA TORONTO EAST           90.00-
 2013-09-13 PUB IN TORONTO CA            45.00-
 2013-09-14 DETROIT WINDSOR TUNNEL        5.00-
 2013-10-10 SOUL CAFE BUADPEST HU        35.00-
 2013-10-30 Withdrawal PAYCHECK - To    500.00-
            SMITH, SANDRA A XXXXXXXXXX

    0.00  Zero

 2013-10-01 Payment From Share 0070     300.00 
 2013-10-01 Withdrawal To Loan          300.00-

Note that Luca has computed a sub-total both for the child accounts Interest and Sales, while also including those sub-totals in the final tally for Income itself.

At this point we face our expenses. I start by addressing expenses that need only one or two regular expressions, and then tackle expenses that are more complicated.

When accounting for my S-Corporation, I go ahead and name the sub-categories right beneath Expenses by whichever line of Form 1120S a particular expense belongs on. This makes it easy to transfer the totals to Form 1120S later.

# rules.yaml

- 2013:
  - Zero:
    - /Withdrawal To Loan/
    - /Payment From Share 0070/

  - Income.Interest:
    - /Deposit Dividend/
  - Income.Sales:
    - /Deposit at ATM/

  - Expenses.7.Payroll:
    - /PAYCHECK/
  - Expenses.19.Bank fees:
    - /Transaction Fee/
  - Expenses.19.Travel:
    - /DELTA TORONTO EAST/
    - /DETROIT WINDSOR TUNNEL/
    - /VIA RAIL/

  - Unknown
  368.20  Accounts
             32.00- Accounts.DCCU Business Visa
              0.06  Accounts.ID 0001 SAVINGS
            400.14  Accounts.ID 0070 CHECKING
  697.00  Expenses
            500.00  Expenses.7
                      500.00  Expenses.7.Payroll

 2013-10-30 Withdrawal PAYCHECK - To    500.00-
            SMITH, SANDRA A XXXXXXXXXX

            197.00  Expenses.19
                        2.00  Expenses.19.Bank fees

 2013-09-11 Foreign Transaction Fee -     2.00-
            Date 09/10/13

                      195.00  Expenses.19.Travel

 2013-09-12 VIA RAIL MONTREAL CA        100.00-
 2013-09-13 DELTA TORONTO EAST           90.00-
 2013-09-14 DETROIT WINDSOR TUNNEL        5.00-

1,200.20- Income
              0.20- Income.Interest

 2013-10-31 Deposit Dividend 0.100%       0.14 
 2013-10-31 Deposit Dividend 0.150%       0.06 

          1,200.00- Income.Sales

 2013-10-15 Deposit at ATM #123       1,200.00 

  135.00  Unknown

 2013-09-11 RESTAURANT TORONTO CA        55.00-
 2013-09-13 PUB IN TORONTO CA            45.00-
 2013-10-10 SOUL CAFE BUADPEST HU        35.00-

    0.00  Zero

 2013-10-01 Payment From Share 0070     300.00 
 2013-10-01 Withdrawal To Loan          300.00-

Finally, we reach a point where all remaining expenses belong to a single category — in this case, all Unknown transactions are restaurant visits during conferences or business travel.

Instead of trying to write a regular expression for every single one, you can simply rewrite your final rule so that it puts all remaining transactions in the Meals category instead of calling them Unknown. So our rules file winds up looking like:

# rules.yaml

- 2013:
  - Zero:
    - /Withdrawal To Loan/
    - /Payment From Share 0070/

  - Income.Interest:
    - /Deposit Dividend/
  - Income.Sales:
    - /Deposit at ATM/

  - Expenses.7.Payroll:
    - /PAYCHECK/
  - Expenses.19.Bank fees:
    - /Transaction Fee/
  - Expenses.19.Travel:
    - /DELTA TORONTO EAST/
    - /DETROIT WINDSOR TUNNEL/
    - /VIA RAIL/
  - Expenses.19.Meals

You might worry that this will mis-categorize future transactions as they arrive with new statements from my credit union. But since I always version control my tally -t output in addition to the rules files themselves, a quick diff in my version control system will make it easy what new transactions have arrived each month so that I can make sure that they land in the right places.

Running a final report

When you are happy with where your transactions are landing, you can take a deep breath, remove -t from your command line, and receive your reward: succinct income and expense tallies driven directly by your bank statements!

  368.20  Accounts
             32.00- DCCU Business Visa
              0.06  ID 0001 SAVINGS
            400.14  ID 0070 CHECKING
  832.00  Expenses
            500.00  7
                      500.00  Payroll
            332.00  19
                        2.00  Bank fees
                      135.00  Meals
                      195.00  Travel
1,200.20- Income
              0.20- Interest
          1,200.00- Sales
    0.00  Zero

Remember that the total for each top-level account will include both the sub-totals from the sub-accounts indented below it, as well as for any transactions that you assigned directly to it.

Automation and version control

A simple make will be enough to run your balances if you build a Makefile that contains rules like:

tally-2013.txt: rules-2013.yaml statements/*.txt
        luca tally -t $+ > $@

So that you can see exactly which transactions are new when you add new bank statements and re-run luca, put everything under version control — both your input rules, and the resulting output file.

git add Makefile
git add statements/*.txt
git add rules-2013.yaml
git add tally-2013.txt
git commit

Given these files under version control, here is my rough algorithm each month for getting new information into Luca and making sure that I like how my rules have classified the new transactions:

# As each new statement arrives, I always
# convert it to *.txt manually to avoid
# putting huge PDFs under version control:

pdftotext -layout ~/Downloads/stmt.pdf > statements/2013-11-checking.txt

# Prepare to enroll it in version control:

git add statements/*.txt

# To check whether your rules classify the
# new transactions appropriately:

make
git diff

# Edit your rules until you like where
# each transaction is landing, then:

git commit .

Of course, if you simply wait until the end of the year to write your rules.yaml and do all of your accounting then you will not need these precautions. No new bank statements will be showing up later to scatter fresh transactions across your categories! Instead, you will write the rules once, already knowing every transactions that they have to successfully classify.