Import CSV with BeanHub Import

We have already gotten the bank transactions as CSV files from the previous step, either by manually downing them from the bank's website or using BeanHub Direct Connect. Now what? We can always find repeating transactions if we look at our transaction data carefully. Be it your rent, internet service fee, or mobile data plan. This kind of transaction appears again and again periodically. Also, we usually categorize repetitive purchases from the same merchants as the same type of expenses. Businesses also run payrolls for employees regularly. In the end, there are only very few unexpected or one-time transactions. The key to successfully making your accounting book as fully automatic as possible is to have the software run through all those transactions with pre-defined rules and create corresponding accounting entries automatically based on data imported from the bank.

Different kinds of tools in the plaintext accounting community can help you import transactions from CSV files and various sources. But usually, data are in different shapes, making it hard to work with. Many tools also couple the process of extracting data and transaction generation in the same tool, making it very hard to reuse the same logic elsewhere. To solve those problems, when building our open-source tools for importing Beancount transactions, we break down the responsibility of extracting and importing. For the extracting part, we built beanhub-extract. It's a simple library to extract CSV files and potentially files in other formats and then provide a standardized data structure for beanhub-import or other import engines to consume.

Diagram shows how beanhub-extract reads CSV files from different banks and produce uniform transaction records

Here are the currently available fields in the Transaction data structure beanhub-extract provides:

extractor - name of the extractor
file - the filename of import source
lineno - the entry line number of the source file
reversed_lineno - the entry line number of the source file in reverse order. comes handy for CSV files in desc datetime order
transaction_id - the unique id of the transaction
date - date of the transaction
post_date - date when the transaction posted
timestamp - timestamp of the transaction
timezone - timezone of the transaction, needs to be one of timezone value supported by pytz
desc - description of the transaction
bank_desc - description of the transaction provided by the bank
amount - transaction amount
currency - ISO 4217 currency symbol
category - category of the transaction, like Entertainment, Shopping, etc..
subcategory - subcategory of the transaction, like Entertainment, Shopping, etc..
pending - pending status of the transaction
status - status of the transaction
type - type of the transaction, such as Sale, Return, Debit, etc
source_account - Source account of the transaction
dest_account - destination account of the transaction
note - note or memo for the transaction
reference - Reference value
payee - Payee of the transaction
gl_code - General Ledger Code
name_on_card - Name on the credit/debit card
last_four_digits - Last 4 digits of credit/debit card
extra - All the columns not handled and put into Transaction's attributes by the extractor goes here

What's beanhub-import and how it works

Now, with beanhub-extract, we can easily extract transaction data from different sources as a standard data structure. Next, it would be the job of beanhub-import to look at those transactions provided by beanhub-extract and see what rules they match, then generate corresponding Beancount transactions for you. Unlike most Beancount or other plaintext accounting importing tools, beanhub-import not only generates the transactions for you but is also smart enough to look at your existing Beancount transactions and update them for you. Here's how it works:

Diagram shows how beanhub-import reads CSV files from different banks with beanhub-extract and produce uniform transaction records, match them by rules, generate Beancount transactions and merge with the existing Beancount files

Step-by-step example

Now you know how beanhub-import works, let's see an example and show you how to do it step by step. Before that, you need to install BeanHub-CLI first. You probably already did it if you've followed the guide for pulling bank transaction CSV files from BeanHub Direct Connect. If not, it's very simple. You only need to ensure you have Python greater or equal to 3.11 installed. Then, you can run:

pip install "beanhub-cli>=2.1.0"

Next, let's define the first simple empty beanhub-import rule file at .beanhub/imports.yaml with content like this:

FIXME

You must also ensure you have at least the main.bean Beancount file in your current folder. If not, you can create one with the following content.

FIXME

Now, you can run the import command of BeanHub-CLI by:

bh import

And you will see output like this:

FIXME

What just happened is that the import command reads your import rule file at .beanhub/imports.yaml and tries to import transactions based on the rule from the input sources. Because the file contains no input and rules, there is nothing the import engine can do.

TODO: example