Reading through Budibase documentation and playing around with the quick start tutorial app, I learned many important details I will need to know as I embark on my own project: converting my personal finance tracking spreadsheet into a database app. Before I start working on the app, though, I'll need to migrate my data over to a Budibase-friendly format.

Budibase accepts uploading data in CSV (comma-separated values) or JSON (JavaScript Object Notation) format. Excel can export a spreadsheet into CSV file format so I thought it would be easy, but once I loaded up my spreadsheet and took a good close look I realized it won't be that simple. My multi-sheet Excel file has built up many idiosyncrasies. Most of it was because I evolved my system over time but I didn't go back and change my older entries to match. My brain can parse the inconsistencies within, because it's the same brain that created them, but they'd trip up a computer database.

I made a copy of my spreadsheet and started editing it to prepare for migration. I plan to create new analysis tools around my migrated database, so I deleted all of my analysis cells leaving just raw data. I removed all empty rows and columns I used for visual separation, because they'd just be noise in the migration. I exported these results to a CSV, imported into Budibase, and it was still a mess. I need even more cleanup.

My next step was a Jupyter notebook. Data cleaning is a pretty common task for data scientists and Panda is a common tool for the job. I don't think I need that level of complexity, though. Python has a built-in class to parse CSV into a dictionary and that was sufficient for my needs. It was enough to let me iterate through my records and write rules to repair my data entry errors. Typos in names, inconsistent date formats, stuff like that.

I started updating structure of data as well, at first just to maintain data consistency between old and new data. But then I started changing things more drastically because my brain shifted gears from a spreadsheet table to a database. I started normalizing my data to the best of my ability (I think I got to 4NF but I'm not sure) because I understood that allows more efficient database operation. I'm pretty pleased at how it turned out! I'm sure I could have created spreadsheet macros to enforce data consistency, but a properly normalized database makes many of my old mistakes outright impossible. This was the point when I really felt I am making the right move: my personal finance tracking system should have been a database all along. I like my set of CSV files, one per table, and the next step is to upload them into my Budibase project.