{"id":692,"date":"2017-02-15T09:25:55","date_gmt":"2017-02-15T09:25:55","guid":{"rendered":"http:\/\/anna.ps\/blog\/?p=692"},"modified":"2017-02-22T17:55:58","modified_gmt":"2017-02-22T17:55:58","slug":"my-year-and-a-bit-working-on-tech-for-good-projects","status":"publish","type":"post","link":"https:\/\/anna.ps\/blog\/my-year-and-a-bit-working-on-tech-for-good-projects","title":{"rendered":"My year-and-a-bit working on tech-for-good projects"},"content":{"rendered":"<p>In the past year or so I did a lot of work on public-interest tech and data projects. I was so busy writing code, designing systems and hiring people that I failed to write anything at all about why these projects were worthwhile, and the sort of design and engineering challenges I had to overcome.<\/p>\n<p>If you\u2019re even slightly into projects that use data and coding for public good, I hope you\u2019ll find this write-up at least mildly interesting!<\/p>\n<h4>Work for Private Eye<\/h4>\n<p>Once in a while, a dream project comes along. This was the case when a Private Eye journalist called <a href=\"http:\/\/christian-eriksson.co.uk\/\">Christian Eriksson<\/a> wrote to say that he\u2019d obtained details of all the UK properties owned by overseas companies via FOI, and wanted help with\u00a0the data. This is how I came to build the <a href=\"http:\/\/private-eye.co.uk\/registry\">Overseas Property map<\/a> for Private Eye, which lets you see which of your neighbours own their property through an overseas company. I\u2019ll write more the tech side of this separately at some point, but essentially the map shows 70,000 properties, two-thirds of which are owned in tax havens.<\/p>\n<figure id=\"attachment_696\" aria-describedby=\"caption-attachment-696\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.private-eye.co.uk\/registry\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-696 size-medium\" src=\"http:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.04.26-300x208.png\" alt=\"Detail from the Private Eye offshore map\" width=\"300\" height=\"208\" srcset=\"https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.04.26-300x208.png 300w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.04.26-768x532.png 768w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.04.26-1024x709.png 1024w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.04.26.png 1132w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-696\" class=\"wp-caption-text\">A detail from the map showing streets in Mayfair &#8211; whole blocks are owned by overseas companies.<\/figcaption><\/figure>\n<p>Christian and fellow Eye hack Richard Brooks wrote more than 30 stories about the arms dealers, money launderers and tax avoiders hiding property via these companies &#8211; the stories eventually became a <a href=\"https:\/\/privateeye.subscribeonline.co.uk\/Products\/digital-editions\">Private Eye Special Report<\/a>. The map was <a href=\"https:\/\/www.theyworkforyou.com\/whall\/?id=2016-05-03b.1.0#g5.0\">discussed<\/a> in <a href=\"https:\/\/www.theyworkforyou.com\/whall\/?id=2015-09-09a.53.0#g67.0\">Parliament<\/a>, <a href=\"https:\/\/www.ft.com\/content\/6610cfd4-5dfa-11e5-a28b-50226830d644\">written up in the FT<\/a>, and the government eventually <a href=\"https:\/\/www.gov.uk\/government\/publications\/overseas-companies-free-dataset-data-specification\/overseas-companies-free-dataset-data-specification\">released the same data publicly<\/a>.<\/p>\n<p>This December, the investigation and map were nominated for the British Journalism Awards, in the &#8216;digital innovation&#8217; and &#8216;investigation of the year&#8217; categories, so I got to go to a fancy awards party. (Not <i>too<\/i> fancy &#8211; the goodie\u00a0bag consisted of some nuts and a bottle of Heineken.) We were highly commended in the &#8216;digital innovation&#8217; category, which was nice.<\/p>\n<p>I also worked on <a href=\"http:\/\/private-eye.co.uk\/councillors\">another project for Private Eye<\/a>. Freelance journalist Dale Hinton spotted that some local councillors (amazingly!) choose not to pay their council tax, and dug out the numbers across the country. Then the Eye\u2019s Rotten Boroughs editor, Tim Minogue, suggested mapping the data. The <a href=\"http:\/\/private-eye.co.uk\/councillors\">resulting map<\/a> just shows the number of rogues in each council. There were some creative excuses from the rogues, but my favourite was the councillor who admitted simply: \u201cI ballsed up\u201d.<\/p>\n<h4>Tech lead at Evidence-Based Medicine DataLab<\/h4>\n<p>My day job for most of 2016 was as tech lead at the <a href=\"https:\/\/ebmdatalab.net\/\">Evidence-Based Medicine DataLab<\/a> at the University of Oxford. This is a new institution set up by the brilliant <a href=\"https:\/\/twitter.com\/bengoldacre\">Dr Ben Goldacre<\/a> (of <a href=\"http:\/\/www.badscience.net\/\">Bad Science<\/a> fame). Evidence-based medicine uses evidence to inform medical practice, and the Lab aims to extend that by helping doctors use data better. I was the first hire.<\/p>\n<p>As you might expect, this was a fascinating and rewarding job. I led on all the technology projects, collaborated on research, and helped build the team from 2 to 9 full-time staff, so a big chunk of my year was spent recruiting. In many ways 2016 was the year when I stopped being \u2018just a coder\u2019, and started to learn what it means to be a CTO. Here are some of the projects I worked on.<\/p>\n<h4>OpenPrescribing<\/h4>\n<p>I got the job at EBM DataLab on the strength of having been the sole developer on <a href=\"https:\/\/openprescribing.net\/\">OpenPrescribing<\/a>, collaborating with Ben and funded by Dr Peter Brindle at <a href=\"http:\/\/www.weahsn.net\/\">West of England Academic Health Sciences Network<\/a>. This site provides a rapid search interface, dashboards and API to all the prescribing decisions made by GPs in England &amp; Wales since 2010. Basically, it makes it easier to see which medicines were prescribed where.<\/p>\n<p>The big challenge on this project was design and UX. I interviewed doctors, prescribing managers and researchers, and we ended up with dashboards to show each organisation where it\u2019s an outlier on various measures &#8211; so each GP or group of GPs can quickly see where it could save money or improve patient care.<\/p>\n<p>The charts use percentiles to allow users to compare themselves with similar organisations, e.g. here\u2019s how <a href=\"https:\/\/openprescribing.net\/ccg\/13P\/\">the group of GPs in central Birmingham<\/a> used to prescribe many more expensive branded contraceptive pills than similar groups elsewhere, but improved things recently:<\/p>\n<figure id=\"attachment_695\" aria-describedby=\"caption-attachment-695\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openprescribing.net\/ccg\/13P\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-695 size-medium\" src=\"http:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/cerazette-graph-300x161.png\" alt=\"Cerazette chart for NHS Birmingham Cross-City CCG\" width=\"300\" height=\"161\" srcset=\"https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/cerazette-graph-300x161.png 300w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/cerazette-graph-768x411.png 768w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/cerazette-graph.png 926w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-695\" class=\"wp-caption-text\">If this group of GPs prescribed branded contraceptives in the same proportion as the median (blue dashed line), they would have spent about \u00a330,000 less in the past six months alone. This is the exact same drug\u00a0&#8211; the only difference is the brand name.<\/figcaption><\/figure>\n<p>There\u2019s also a <a href=\"https:\/\/openprescribing.net\/analyse\/\">fast search form<\/a> for users who know what they\u2019re looking for, and <a href=\"https:\/\/openprescribing.net\/api\/\">an API<\/a> that lets researchers query for raw data files. Technically, it\u2019s a Postgres\/Django\/DRF back-end, and JavaScript front-end with Highcharts to render the graphs (<a href=\"https:\/\/github.com\/ebmdatalab\/openprescribing\">code here<\/a>).<\/p>\n<p>The <a href=\"http:\/\/content.digital.nhs.uk\/searchcatalogue?q=title%3A%22presentation+level+data%22&amp;area=&amp;size=10&amp;sort=Relevance\">raw data files<\/a> are so unwieldy that previously (we were told) they were only really used by pharma companies, to check where their drugs were being under-prescribed and target marketing accordingly. In fact, we heard that lobbying from pharma was what got the NHS to release the open data in the first place!<\/p>\n<p>OpenPrescribing was also an interesting technical challenge, because the dataset was reasonably large (80GB, 500 million rows), and users need to run fast queries across all of it. Since I didn\u2019t have millions to give to Oracle, which is what the NHS does internally, I used Postgres for our version. With a bit of love and optimisation, it was all performant and scaled well.<\/p>\n<h4>Writing papers with BigQuery<\/h4>\n<p>As well as building services, EBM DataLab writes original research. Over the year I co-authored three papers, and wrote numerous analyses now in the paper pipeline. I can\u2019t go into detail since these are all pre-publication, but they\u2019re mostly based on the prescribing dataset, about how the NHS manages (or doesn\u2019t) its \u00a310 billion annual prescribing budget.<\/p>\n<p>Probably the most enjoyable technical aspect of last year was setting up the data analysis tools for this &#8211; well, I\u2019m not going to call it &#8216;big data&#8217; because it\u2019s not terabytes, but let\u2019s say it\u2019s reasonably sized data. I set up a <a href=\"https:\/\/cloud.google.com\/bigquery\/\">BigQuery<\/a> dataset, which makes querying this huge dataset fast, and as simple as writing SQL. Then I connected the BigQuery dataset to <a href=\"http:\/\/jupyter.org\/index.html\">Jupyter<\/a> notebooks, writing analyses in <a href=\"http:\/\/pandas.pydata.org\/\">pandas<\/a> and visualising data in <a href=\"http:\/\/matplotlib.org\/\">matplotlib<\/a> &#8211; I highly recommend this setup if you\u2019ve got <span style=\"text-decoration: line-through;\">big<\/span> reasonably sized data to analyse.<\/p>\n<h4>Tracking clinical trials<\/h4>\n<p>Another project was <a href=\"https:\/\/trialstracker.ebmdatalab.net\/\">TrialsTracker<\/a>, which tracks which universities, hospitals and drug companies aren\u2019t reporting their clinical trial results. This matters because clinical trials are the best way we have to test whether a new medicine is safe and effective, but many trials never report their results &#8211; especially trials that find the medicine isn\u2019t effective. In fact, trials with negative results are <a href=\"http:\/\/www.alltrials.net\/news\/half-of-all-trials-unreported\/\">twice as likely to remain unreported<\/a> as those with positive results.<\/p>\n<p>The <a href=\"https:\/\/trialstracker.ebmdatalab.net\/\">TrialsTracker<\/a> project tries to fix this by naming and shaming the organisations that aren\u2019t reporting their clinical trials. This was Ben\u2019s idea, and I wrote <a href=\"https:\/\/github.com\/ebmdatalab\/trialstracker\">the code<\/a> to make it work. It gets details of all trials registered on <a href=\"https:\/\/clinicaltrials.gov\/\">clinicaltrials.gov<\/a> that are listed as &#8216;completed&#8217;, and then checks whether their results are published either there or on <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\">PubMed<\/a> using a linked identifier (so a researcher can find them easily). Then it aggregates the results by trial sponsor, showing the organisations with the worst publication record:<\/p>\n<p><a href=\"https:\/\/trialstracker.ebmdatalab.net\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-705\" src=\"http:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.15.20-300x161.png\" alt=\"Screenshot of TrialsTracker\" width=\"300\" height=\"161\" srcset=\"https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.15.20-300x161.png 300w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.15.20-768x411.png 768w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.15.20-1024x548.png 1024w, https:\/\/anna.ps\/blog\/wp-content\/uploads\/2017\/02\/Screenshot-2017-02-14-11.15.20.png 2002w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>My approach to this was minimum viable product: it\u2019s a simple, responsive site that clearly lays out the numbers for each organisation, and provides an incentive to publish their unpublished trials (since the data is updated regularly, if they publish past trials, their position in the table will improve over time). We wrote <a href=\"https:\/\/f1000research.com\/articles\/5-2629\/v1\">a paper on it in F1000 Research<\/a>, the open science journal, and the project was covered in the <a href=\"http:\/\/www.economist.com\/news\/science-and-technology\/21709525-tested-and-found-wanting\">Economist<\/a>.<\/p>\n<p>The best part of this project was getting numerous mails from researchers saying \u201cthis will help me lobby my organisation to publish more\u201d. Yay!<\/p>\n<h4>Other projects<\/h4>\n<p>I also worked on the alpha of <a href=\"https:\/\/ebmdatalab.net\/retractobot\/\">Retractobot<\/a> (coming soon), a new service to notify researchers when a paper they\u2019ve cited gets retracted. This matters because <a href=\"http:\/\/www.vox.com\/2016\/3\/24\/11299102\/scientific-retractions-are-on-the-rise\">more and more papers are being retracted<\/a>, yet they <a href=\"https:\/\/qz.com\/583497\/researchers-keep-citing-these-retracted-papers\/\">continue to get cited frequently<\/a>, so bad results go on polluting science long after they\u2019ve been withdrawn. And I built the front-end website for <a href=\"http:\/\/compare-trials.org\/methods\">the COMPare project<\/a> &#8211; this is a valiant group of five medical students, led by Ben, who checked for <a href=\"http:\/\/www.vox.com\/2015\/12\/29\/10654056\/ben-goldacre-compare-trials\">switched outcomes<\/a> in every trial published in the top five medical journals for six weeks, then tried to get them fixed. (Spoiler: the journals were NOT happy.) Here&#8217;s\u00a0<a href=\"http:\/\/www.nature.com\/news\/make-journals-report-clinical-trials-properly-1.19280\">more about COMPare<\/a>.<\/p>\n<h4>Onwards!<\/h4>\n<p>After just over a year at EBM DataLab, I decided to move on to pastures new at the end of 2016. I\u2019d had a lot of fun, but the organisation was now stable and mature, and I was keen to explore other interests outside healthcare. I\u2019ve left the tech side of things in the highly capable hands of our developer Seb Bacon, previously CTO at <a href=\"https:\/\/opencorporates.com\/\">OpenCorporates<\/a>.<\/p>\n<p>Since then, I\u2019ve been having fun working through a list of about 25 one-day coding and data analysis side projects (of the kind you always want to do, but never have time). These side projects include: several around housing and land, including with Inside AirBNB\u2019s data; statistical methods for conversion funnels; building an Alexa skill; setting up a deep learning project with Keras and Tensorflow to classify fashion images; more work on dress sizing data; and a few data journalism projects.<\/p>\n<p> Longer-term, I&#8217;m thinking of joining an early-stage venture as tech lead. If you\u2019d like to chat about the above, or just about anything related to coding, stats or maps, I\u2019m always keen to have coffee with interesting people: <a href=\"mailto:anna@anna.ps\">drop me a line<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the past year or so I did a lot of work on public-interest tech and data projects. I was so busy writing code, designing systems and hiring people that I failed to write anything at all about why these projects were worthwhile, and the sort of design and engineering challenges I had to overcome. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,14],"tags":[],"class_list":["post-692","post","type-post","status-publish","format-standard","hentry","category-personal","category-tech-for-good"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/posts\/692","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/comments?post=692"}],"version-history":[{"count":29,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/posts\/692\/revisions"}],"predecessor-version":[{"id":725,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/posts\/692\/revisions\/725"}],"wp:attachment":[{"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/media?parent=692"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/categories?post=692"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/anna.ps\/blog\/wp-json\/wp\/v2\/tags?post=692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}