Sometimes a DB is easier

May 27, 2026

I've been putting off updating the household budgets for a while. The income situation kept changing as both my partners concluded their job searches. In hindsight, kinda foolish: that's exactly when it would have been best to have a budget to make sure we weren't overspending. My heuristic was our savings account stayed steady, so it wasn't worth worrying about at the time.

Also, I was trying to use ActualBudget, and I kept falling behind on importing things, and the pressure to account for every individual spend was driving me bananas when I just wanted approximate breakdowns by category.

I finally had some free time (and had some planning I needed to get done) so this last weekend I sat down and pulled the CSVs of the last year's worth of transactions on the joint accounts. My goal was to categorize everything and roll up totals by category (along with proportion of total income, just 'cuz) so I could then figure out what our monthly essentials spend was, and plan savings accordingly.

In the past I've drawn up budgets on spreadsheets. I mean, that's what they're for, right? And spreadsheets love CSVs, so this should be easy!

First I dumped the CSV into my spreadsheet. It looked kinda like this:

Account Type,Account Number,Transaction Date,Cheque Number,Description 1,Description 2,CAD$,USD$
MajorCardProcessor,1234567890,4/1/2025,,FANCY GROCERY #12 MYTOWN,,-54.16,
MajorCardProcessor,1234567890,4/3/2025,,TAKEOUT COMPANY NEARCITY,,-42.06,
MajorCardProcessor,1234567890,4/3/2025,,LOCAL GYM MYTOWN,,-28.25,
MajorCardProcessor,1234567890,4/3/2025,,TAKEOUT COMPANY NEARCITY,,-90.13,
MajorCardProcessor,1234567890,4/4/2025,,CUTESY ANIMAL CLINIC NAME MYTOWN,,-209.27,
MajorCardProcessor,1234567890,4/4/2025,,NEWSPAPER,,-28.25,
MajorCardProcessor,1234567890,4/5/2025,,NORMAL GROCERY #62 MYTOWN,,-11.3,
MajorCardProcessor,1234567890,4/5/2025,,NORMAL GROCERY #62 MYTOWN,,-136.84,
MajorCardProcessor,1234567890,4/5/2025,,LIQUOR STORE #0447 MYTOWN,,-69.6,
MajorCardProcessor,1234567890,4/5/2026,,GET SOME GAS C12345 NEARTOWN,,-25.25,
MajorCardProcessor,1234567890,4/5/2026,,SOME BRO ENERGY - ABC NEARTOWN,,-0.89,
MajorCardProcessor,1234567890,4/6/2026,,GAS GAS MC GASMAN MYTOWN,,-48.51,

I've anonymized this a bit, but this was the structure. Yes, for some reason my bank thought it was a good idea to include my full account number in this file. I'm not a huge fan, but whatever. Categorizing these seemed easy enough: all I needed was a map of keywords to categories. If a description contains a keyword, bam, we know what category it maps to. Then you can do some cursed formula stuff like this:

=XLOOKUP(1, ISNUMBER(SEARCH($'Category Lookup'.B:B, E2)), $'Category Lookup'.A:A)

Initially I had some issues because whenever I made an edit my Calc (e.g., LibreOffice Calc, which is like Excel if you're unfamiliar) would lock up for a couple seconds, and my laptop's fan started screaming. Eventually I figured out that knockoff calc wasn't clever enough to know I only had some 700 rows populated, so it was searching the entire 100k row viewspace in column B for keywords, even though most of those rows were empty, and I shortened the B:B lookup to B2:B:70.

Anyway, I eventually got all the transactions categorized and wanted to move on to totaling up by category...

...which somehow was hard? I though there had to be a "sum by X field" rollup function, but I couldn't find it in a few minutes of googling. I could have just done a =SUMIF() kinda deal over the entire list of transactions for each individual category (and calc's autofill probably would have tried to help), but that seemed annoying especially since I then wanted to filter.

I also got intermediately distracted by trying to have Calc roll up uncategorized spend by description so I could add keywords for the descriptions with the most spend, but apparently you have to do pivot tables or some wild crap.

At this point I was getting supremely frustrated because I knew I could bang out like three SQL queries and have exactly the information I wanted, and banging my head on this spreadsheet was just driving me crazy. I've even worked on SQL data pipelines in a previous job! Then I remembered that SQLite can natively ingest CSV files.

iridian@laptop $ sqlite3
sqlite> .mode csv
sqlite> .import categories.csv categories_by_merchant_keyword
sqlite> .import transactions.csv expenses
sqlite> select exp.*, c.category from expenses exp left join categories c on instr(exp.'desc_1', c.keyword)

account_name	corrected_date	check_number	desc_1	desc_2	CAD$	USD$	category
shared credit	2025-04-01		FANCY GROCERY #12 MYTOWN        -54.16		Grocery
shared checking	2025-04-03		INSURANCE JIMBOB A&B&C INS      -94.69		Insurance
shared credit	2025-04-03		LOCAL GYM MYTOWN                -28.25		Gym
shared credit	2025-04-03		TAKEOUT COMPANY NEARCITY        -90.13		Takeout
shared credit	2025-04-03		TAKEOUT COMPANY NEARCITY        -42.06		Takeout
shared checking	2025-04-04		MISC PAYMENT MYTOWN-FOO         -99.93		Power
shared credit	2025-04-04		CUTE ANIMAL CLINIC MYTOWN       -209.27		Dog
shared credit	2025-04-04		NEWSPAPER BIG FANCY TOWN NAME   -28.25		Subscriptions
shared credit	2025-04-05		FANCY GROCERY #12 MYTOWN        -159.55		Grocery
shared credit	2025-04-05		LIQUOR STORE #0447 MYTOWN      -69.6   	   Alcohol

(okay I'm cheating a bit, those results are actually after I cleaned some stuff up (like the date) but you get the gist)

That's so much easier. And now I can do things like select sum(CAD$) as total_spend, desc_1 as merchant from categorized_expenses where category is NULL group by merchant order by total_spend asc which then tells me what merchants I should prioritize adding categories for. Working with SQL scratches a deep itch in a part of my brain, it's so, so lovely.

Even better, I can create a bunch of views to munge the data without actually modifying the original data, so if I discover a mistake in my logic I can just update the view. That won't scale, but for <1k in transactions sqlite can handle it just fine.

Something that took some doing was that my bank, for some ungodly reason, output dates in the format MM/DD/YYYY, which broke lexogrpahic sort. Sqlite doesn't have true built in date formatting (it's very sparse insofar as primitives) so I couldn't hack a strfmt together... but I could do some cursed string manipulation:

with initial_date_population as (
    select
        rowid as original_rowid, -- need this to join the dates back
        `Transaction Date` as date,
        REPLACE(
            `Transaction Date`,
            RTRIM(`Transaction Date`, REPLACE(`Transaction Date`, '/', '')),
            ''
        ) as year,
        SUBSTR(`Transaction Date`, 1, INSTR(`Transaction Date`, '/') - 1)
            as just_month,
        SUBSTR(`Transaction Date`, INSTR(`Transaction Date`, '/') + 1)
            as less_month
    from my_credit
),

with_date_components as ( --I can't select from aliased/computed columns, annoyingly, so we have to repeatedly stack CTEs
    select
        date,
        year,
        just_month as month,
        original_rowid,
        REPLACE(less_month, '/' || year, '') as day
    from initial_date_population
),

with_date as ( -- same problem as above, just munging now
    select
        original_rowid,
        year
        || '-'
        || SUBSTR('0' || month, -2, 2)
        || '-'
        || SUBSTR('0' || day, -2, 2) as reformatted_date
    from with_date_components
)

insert into expenses (
    desc_1, desc_2, check_number, `CAD$`, `USD$`, corrected_date, account_name
)
select
    cx.`Description 1`,
    cx.`Description 2`,
    cx.`Cheque Number`,
    cx.`CAD$`,
    cx.`USD$`,
    wd.reformatted_date as corrected_date,
    'my credit' as account_name
from my_credit as cx -- my_credit is the raw import of the downloaded CSV
inner join with_date as wd on cx.rowid = wd.original_rowid

Voila! Spreadsheets are what people tend to reach for first.... but sometimes a database is faster.

ironically, I did turn around and dump my categorized table back into a spreadsheet so I could fiddle with some things, but having the entire data pipeline in a SQL file is fantastic. Makes it a cinch to periodically import transactions from my bank, too.

Now I just need to teach my partners about SQL (and SQLite)...

https://hnr.spacefish.net/posts/feed.xml