Freeing tax forms--from what?

Freedom from being locked-in, from being overcharged, and from supporting the pro-tax-complexity lobby. As long as we're filling tax forms, we need to fix these problems. We need to open source tax forms

benefits of open source tax forms

  • no lock-in--developers can create a variety of interfaces that the user can choose from, all of which support data interchange so users never feel stuck using a site just because they've spent time entering their tax information.
  • lower prices--open forms and crowd-sourced form updates will lower the bar for new tax software vendors and so lower the cost for everyone.
  • no longer supporting vendors that lobby against taxpayers' interests.

commercial tax software

For a few years I used one of the nationally prominent tax softare vendors (one of the cheaper ones), paying extra fees for the NJ state form. It allowed me to e-file, but required an upgrade to the "deluxe edition" just to print my filled-out tax forms. They promise "pay when you file" but at that point who would want to switch to another vendor when it would require re-entering all your information? Then I started manually filling in the fillable PDF files from IRS.gov, each year trying to automate a bit more...

# "--pre" needed due to pdfminer's unconventional version string
$ pip install --pre pdfminer
$ dumppdf.py -at f1040.pdf > f1040.xml
<data size="391179"><template xmlns="http://www.xfa.org/schema/xfa-template/2.8/"><?formServer ...
infname ='f1040-xfaTemplate-unescaped.xml'
outfname='f1040-xfaTemplate-unescaped-pretty.xml'
from lxml import etree
parser=etree.XMLParser(encoding='utf-8',recover=True)
with open(infname) as xml:
    tree=etree.parse(xml,parser)
    with open(outfname,'w') as outfile:
        outfile.write(etree.tostring(tree,pretty_print=True))

status categories

  • layout is textboxes and checkboxes--they should not overlap.
  • refs are references to other forms--they should all be recognized (ie, in the list of all forms).
  • math is the computed fields and their dependencies--each computed field should have at least one dependency.
Each status error has a corresponding warning in the log file, so they're easy to find. Each bugfix will likely reduce errors across many forms.

pre-release todo

  • release checklist: tox, cheesecake, sphinx, ReadTheDocs, continuous integration
  • reduce dependencies: some external utilities can be replaced, simplifying installation
  • API: I envision developers using the API to create a variety of interfaces that the user can choose from, all of which support data interchange so users never feel stuck using a site just because they've spent time entering their tax information
  • create the website to bring testers, developers, and ultimately users together!

post-release todo

  • fix known bugs, mainly driven by automatic reporting of status errors
  • add more tests, especially to exercise dependencies likely to cause FAQs, with coverage metric
  • parse instructions PDFs to extract worksheets [which are not available as fillable PDF]
  • OCR could help users automate the entry of data from their W-2 forms. In the longer term, the IRS could alter its file formats. Even today, draft tax forms dont contain embedded XFA, which apparently is added only to the final version of the form. OCR would be robust to such changes.