Friday, August 30, 2019

10 ways Africans may benefit from Artificial Intelligence

What do you think when you hear the words Artificial Intelligence (AI)? I think many people tend to imagine science fiction stories about robots taking our jobs and ruling the world. That's a pretty grim view if you ask me.ai
It's no secret that AI is on the horizon. It seems like everyone's image of the future involves AI. As a real estate developer, I'm in the imagination business. Therefore, I like to think that AI will make our world better, not worse.


Here's a little secret: AI is already here. It's in the products we use. Streaming services like Netflix and Spotify use it, not to mention Apple's Siri, Amazon Alexa, and the Google Home. Their algorithms learn from your music tastes and viewing habits and recommend similar content to make these experiences more personal.
These products aren't scary. They're easy to use and delightful. For that reason, I think it’s time we stop thinking about how AI will harm us and instead focus on the ways it will help us live better lives. To that end, I'm going to show you the top 10 ways Africans can benefit from AI.
AI: A big deal?
But is AI the next big thing or just a fad? Well, here are some trends that show you how serious people are about AI
Public perception of AI is positive. In a survey, 61% of people think that society will become better with increased automation and AI. This sentiment is especially true among younger people who grew up a digital-friendly environment.
Venture Capital investment in AI is increasing. 2017 was a record year for AI investment. It was up 300% from the year before. Investors believe AI is going to influence almost every aspect of life, so they are making investments in many different industries.
Businesses believe AI will give them a competitive advantage. Over 80% of survey respondents believe this. AI will enable enterprises to move into other industries and markets. And here's the curious part, only 23% have incorporated AI into their business so far.
I believe these trends show that Nigeria has a great opportunity. We can ensure that AI increases the quality of life in our country.

The top 10 ways AI could benefit Africans

1. Self-driving cars: Human error is the leading factor for many car-related incidents, especially in Africa. Eliminating the human element will cut down on car accidents, making our streets safer. Some autonomous cars are already on the road in many parts of the world. And if passengers are no longer driving, they can spend time doing other tasks. The car can be redesigned to cater to those changing needs.
2. Doing dangerous jobs: Many people are worried that AI robots will take their jobs. But humans are still needed in the workforce. AI will only start doing dull or dangerous jobs. At the very least, humans and machines will work together.
3. Personalised medicine: A new kind of tattoo is emerging - the smart tattoo. It places health sensors in your skin and can tell you when your blood sugar is low or if you are dehydrated, for example. These tattoos are as informative as they are decorative.
4. Improved elder care: AI could enable the elderly to retain their independence. AI-powered robots could keep people engaged by providing conversation, reminding them to take medications and offering suggestions for mental and physical activity. Overall, AI could provide wellness and environmental monitoring.
5. Virtual personal assistants: You might say that Siri or Alexa are already doing this. But I believe that they will only get more sophisticated. Just as smartphones spread throughout Africa like wildfire, so too will smart home and virtual assistants in the 2020s. There are many new services out there from companies large and small that will book your meetings, find you places to eat and drink, and most importantly, understand your questions and give you the right answer.
6. Smart homes: AI could automate your entire home, resulting in energy savings. Imagine the air conditioning turning on when you get home from work and the lights dimming for ambience. And when someone knocks on the door when you're away, you can check your phone to see who it is. These systems are meant to be intuitive and straightforward.
7. Better prosthetics: AI is now powering prosthetics. Prosthetics are embedded with cameras using computer vision. The camera takes a picture of the object and makes quick calculations about the proper grasp needed to pick up the object. These prosthetics are ten times faster than competitors.
8. Improving education: AI could benefit African education in many ways. It could automate tasks like grading, help improve courses, and tutor students. On a side note, many universities are starting AI programs for study.
9. Making entertainment more personal: I mentioned earlier that services like Spotify and Netflix are already using AI. And it's only going to get more personalised. AI will be able to analyse a movie scene, understand what characters are feeling, and determine the mood and themes of specific content, among others.
10. Boosting creativity: AI has written movie screenplays and painted paintings worth thousands of dollars. If you ever want to draw but you don't know how to start, AI can help you. Google released a program that guesses what you're drawing and then presents you with a list of previously created drawings. Writing, music, and design are also using AI.
This list offers only a glimpse into where AI can take us. The possibilities are truly endless. The best part is that we don't need to wait too long for its arrival.

Elon Musk and Jack Ma disagree about AI's threat

Jack Ma and Elon Musk
Alibaba's Jack Ma and Tesla's Elon Musk took opposing views of the risks and potential rewards of artificial intelligence at an event in Shanghai.
The Chinese entrepreneur said he was "quite optimistic" about AI and thought it was nothing for "street smart" people like them to be scared of.
"I don't know man, that's like famous last words," responded Tesla's chief.
Mr Musk added that technology was evolving faster than our ability to understand it.
The two did, however, agree on one topic: that one of the biggest problems the world is facing is population collapse.
The businessmen are two of the most influential tech leaders shaping the world today.
US-based Mr Musk made his fortune at the digital payments firm PayPal before going on to run electric car-maker Tesla, space rocket company SpaceX and tunnel-transport business The Boring Company among other ventures. He also helped create OpenAI, a San Francisco-based AI research company, although he has since broken ties with it.
Mr Ma co-founded Alibaba, which rivals Amazon for the title of the world's largest e-retailer and is also one of the world's largest cloud computing providers. The group is one of the world's biggest spenders on AI, both within its own business as well as via investments in dozens of third-party companies.
Their 45-minute conversation kicked off the World AI Conference (WAIC), which ties into China's goal of overtaking the US to become the world's leading artificial intelligence innovator by 2030.

Less work

Mr Ma focused much of his comments on how machine learning could act as a force for good. He said it was something "to embrace" and would deliver fresh insights into how people think.
"When human beings understand ourselves better, then we can improve the world better," he explained.
Furthermore, he predicted AI would help create new kinds of jobs, which would require less of our time and be centred on creative tasks.
"I think people should work three days a week, four hours a day," he said.
"In the artificial intelligence period, people can live 120 years.
"At that time we are going to have a lot of jobs which nobody [will] want to do. So, we need artificial intelligence for the robots to take care of the old guys.
"So that's my view about jobs, don't worry about it, we will have jobs."
Jack Ma
By contrast, Mr Musk suggested that mass unemployment was a real concern.
"AI will make jobs kind of pointless," he claimed.
"Probably the last job that will remain will be writing AI, and then eventually, the AI will just write its own software."
He added that there was a risk that human civilization could come to an end and ultimately be seen as a staging post for a superior type of life.
"You could sort of think of humanity as a biological boot loader for digital super-intelligence," Mr Musk explained.
"A boot loader is... sort of like the minimal bit of code necessary for a computer to start.
"You couldn't evolve silicon circuits. There needed to be biology to get there."
To avoid such a fate, he said we needed to find a way to connect our brains to computers so that we could "go along for the ride with AI" - something he is trying to achieve via one of his latest start-ups.
Otherwise, he cautioned, AI would become weary of trying to communicate with humans, as we would be much slower thinkers in comparison.
"Human speech to a computer will sound like very slow tonal wheezing, kind of like whale sounds," Mr Musk explained.
"What's our bandwidth? Like a few hundred bits per second, basically, maybe a few kilobits per second, if you're going to be generous.
"Whereas a computer can easily communicate at a terabit level. So, the computer will just get impatient if nothing else. It'll be like talking to a tree - that's humans.
"It will be barely getting any information out."
By contrast, Mr Ma acknowledged that AI could now beat humans at games like chess and Go, but claimed computers would only be one of several intelligent tools that we would develop in time.
"Don't worry about the machines," he said.
"For sure, we should understand one thing: that man can never make another man.
"A computer is a computer. A computer is just a toy.
"Man cannot even make a mosquito. So, we should have a confidence. Computers only have chips, men have the heart. It's the heart where the wisdom comes from."
Although Mr Ma acknowledged that we needed to find ways to become "more creative and constructive", he concluded that "my view is that [a] computer may be clever, but human beings are much smarter".
Mr Musk responded: "Yeah, definitely not.
"It's going to get to the point where [AI] just can completely simulate a person in every way possible, like many people simultaneously," he added.

"In fact, there's a strong argument, we're in the simulation right now."
Elon Musk
Towards the end of the event the two men came together on one point - that concerns about overpopulation were misguided.
"Assuming... there's a benevolent future with AI, I think that the biggest problem the world will face in 20 years is population collapse," said Mr Musk
"I want to emphasise this, the biggest issue in 20 years will be population collapse, not explosion collapse."
Mr Ma said he was absolutely in agreement.
"One point four billion in China sounds a lot, but I think [over the] next 20 years we'll see this thing bring big trouble to China," he said.
"And the speed of population decreasing is going to speed up. Now you've got a collapse."
However, he suggested, using AI to help people live longer, healthier lives could be part of the solution.

Cloud Computing Technologies: A Global Outlook

Cloud Computing Technologies: A Global Outlook
The report discusses the market dynamics, which have an impact on this market, and provides information on applications, security and vulnerabilities of technologies.
This study also aims to assess competitors and included profiles of key companies active in  technologies markets.
Report Scope:
The report provides, a general outlook of various cloud-based technology markets , with the scope limited to reports published by BCC Research during the year 2018 and 2019. The report segments cloud technologies market by service type: Software as a Service (S-a-a-S/SaaS); Infrastructure as a Service (I-a-a-S/IaaS); and Platform as a Service (P-a-a-S/PaaS). Further, the market is also segmented by deployment mode: public cloud, private cloud and hybrid cloud.
The cloud-based technologies market segmented by service type defines prevalent and advanced computing solutions or technologies hosted by cloud service providers (CSPs) or managed service providers (MSPs) ‘as a service’, through their datacenters. Any solutions, applications or IT component hosted –for their clients— by any IT vendor in form of service (or as-a-service), either through their own cloud data-center or in partnership with any third-party CSPs or MSPs, falls under either of three services types. The segment focuses on highlighting the recent trends, advancements, and applications of such solutions in various industries, as part that covering the qualitative aspects of the market in brief. The market size for the service type segment is provided only for the ‘public cloud deployment mode.
The report also segments the  technologies market by component and industry vertical. The segment provides descriptive information on various component, including different hardware, software and services, that makes technologies work. Further, market by industry vertical includes detailed overview of how organizations in various industries are utilizing different cloud-based solutions getting benefitted. The segment also provides the relevant market size and estimation for 2019 to 2024.
Report Includes:
– 29 tables
– A general outlook of the global  technologies market
– Analyses of the global market trends with data from 2018, estimates for 2019, and projections of compound annual growth rates (CAGRs) through 2024
– Discussion of  technologies, by various service models – Software-as-a-Service (SaaS), Infrastructure-as-a-Service (IaaS), and Platform-as-a-Service (PaaS)
– Information on current market trends and technology background including drivers, restraints and opportunities
– Coverage of patent reviews and key new developments in the market
– Detailed profiles of leading manufacturers, suppliers and service providers of  technologies, including Adobe Systems Inc., Cisco Systems, Inc., Hewlett Packard Enterprise (HPE) Co., Microsoft Corp., Oracle Corp. and SanDisk Corp.
Summary
Traditionally, Information technology (IT) business applications and solutions have always been very complicated to manage and expensive. Huge capital and upfront cost are required to implement and run variety of hardware, software or business applications. Large enterprises need a dedicated team of IT experts to manage, install, configure, test, run, secure, and update— those applications. Whereas, small and medium businesses’ (SMBs) face difficulties in setting up the required infrastructure. The emergence of advanced virtualization solutions gave rise to the .[…]

Thursday, August 29, 2019

How The Deep Learning Approach For Object Detection Evolved Over The Years

How The Deep Learning Approach For Object Detection Evolved Over The Years

Machine learning algorithms for image processing have evolved at a tremendous pace.

They can now help in the reconstruction of objects in ambiguous images, colouring old videos, detecting the depth in moving videos and much more.
One recurring theme in all these machine vision methods is teaching the model to identify patterns in images. The success of these models has a wide range of applications. Training an algorithm to differentiate between apples and oranges can eventually be used in something as grave as a cancer diagnosis or to unlock the hidden links of renaissance art.
The job of any object detector is to distinguish objects of certain target classes from backgrounds in the image with precise localisation and correct categorical label prediction to each object instance. Bounding boxes or pixel masks are predicted to localise these target object instances.
Due to the tremendous successes of -based image classification, object detection techniques using  have been actively studied in recent years.
In the early stages, before the  era, the pipeline of object detection was divided into three steps:
1. Proposal generation
2. Feature vector extraction
3. Region classification&nbsp
Commonly, support vector machines (SVM) were used here due to their good performance on small scale training data. In addition, some classification techniques such as bagging, cascade learning and AdaBoost were used in region classification step, leading to further improvements in detection accuracy.
However, from 2008 to 2012, the progress on Pascal VOC based on these traditional methods had become incremental, with minor gains from building complicated ensemble systems. This showed the limitations of traditional detectors.
After the success of applying deep convolutional neural networks for image classification, object detection also achieved remarkable progress based on  techniques.
Compared to traditional hand-crafted feature descriptors, deep neural networks generate hierarchical features and capture different scale information in different layers, and finally produce robust and discriminative features for classification. utilise the power of transfer learning.
The picture above is an Illustration of Major milestone in object detection research based on deep convolutional neural networks since 2012.
Currently, -based object detection frameworks can be primarily divided into two families:
    • two-stage detectors, such as Region-based CNN (R-CNN) and its variants and
    • one-stage detectors, such as YOLO and its variants.

Evolution Of Two-Stage Detectors

Two-stage detectors commonly achieve better detection performance and report state-of-the-art results on public benchmarks, while one-stage detectors are significantly more time-efficient and have greater applicability to real-time object.
R-CNN is a pioneering two-stage object detector proposed in 2014, which significantly improved the detection performance
R-CNN faces some critical shortcomings. For instance, the features of each proposal were extracted by deep convolutional networks separately (i.e., the computation was not shared), which led to heavily duplicated computations. Thus, R-CNN was extremely time-consuming for training and testing.[…]

Wednesday, August 28, 2019

The Emergence of Deep Learning in Biomechanics


The integration of artificial neural networks and  has enabled the development of robust models in the biomechanics field.

FREMONT, CA – The last few years have witnessed significant revolutions in technology. Artificial intelligence () has evolved from the field of traditional computer science, establishing its presence in various sectors and industries, including biomechanics. The capabilities of  technology are steadily transforming healthcare at science, clinical, and management levels.
 has shown significant potential in the biomechanics landscape, leading to the development of intelligent diagnostic tools for assessing various mechanical conditions of the biological system. Artificial neural networks (ANNs) are leveraged to study movement optimization to forecast an optimal approach for biomechanics operations.
ANN enables computers to analyze and learn, facilitating a mathematical model of neurons in the brain. They represent non-linear systems such as human movements to form the notational analysis perspective. The data acquired by the ANNs are stored in multiple layers. However, the networks require vast training datasets before they can be sent for testing.
The integration of ANN technology with modern  () has led to the emergence of . The computational boost offered by the new GPUs has enabled the systems to process vast troves of datasets. It has enhanced the capabilities of computer vision, including image classification, object detection, face recognition, and  (), and  ().
The  models are developed by superimposing layers of neurons. The models are trained using backpropagation algorithms, empowering the machines to compute the representation in the various layers. Once activated, the models can automatically learn intricate patterns from high-dimensional raw data with minimal guidance.
By leveraging  techniques, clinicians have been able to enhance patient care, using robust models to extract relevant data on treatment patterns.  is being used by medical organizations to sort through the vast digital health data and come up with insights to predict the effect of drugs.
 has made significant progress in musculoskeletal medicine, allowing greater understanding of biomechanics based on anatomical shape assessment. Analyzing the complex psychological data using  has enabled gait analysis, enhancing the diagnostic accuracy in patients with spinal stenosis. The development of gait model by contrasting patient motion with normal controls using a support vector machine (SVM) has enabled better diagnosis.
The complex gait analysis data has led to the development of sophisticated models to estimate the presence of various conditions. The engineering approach to different medical conditions has enabled solution providers in the healthcare sector to develop robust models, including deformable joint contact models for estimating loading conditions in implant-implant, human-orthotic, and foot-ground interactions.[…]

Top-10 Artificial Intelligence Startups in Turkey

Top-10 Artificial Intelligence Startups in Turkey

What’s now called Turkey was once the center of the Ottoman Empire, a global hub of culture and science during its heyday which lasted over 600 years.

What’s now called Turkey was once the center of the Ottoman Empire, a global hub of culture and science during its heyday which lasted over 600 years. It was the birthplace of the first surgical atlas and the first watch that measured time in minutes, and it’s where astronomers first calculated the eccentricity of the Sun’s orbit. Today, Turkey is better known for its rich cultural heritage, with large numbers of Russian and German tourists haggling over evil eyes, sipping Turkish tea in bazaars, and enjoying the hot water baths of Istanbul. With a population nearing 79 million people, Turkey also has high-quality and relatively cheap resources for developed markets to exploit capitalize on, along with a budding startup ecosystem .
Deal sizes might be on the low side, but Turkish tech startups have stepped up to participate in the global  race. The country is in the process of formulating an strategy that will become a bridge between private stakeholders and public policies, boosting research in the field. We scoured Crunchbase to find the ten Turkish startups that have received the most funding to date.
Originally established in Istanbul in 2017, FalconAI Technologies has now been transplanted to Bawstuhn Massachusetts where their $3 million in funding is being used to develop , computer vision, and generic  algorithms for various applications. The startup currently offers two products. The first is an app called FashionI that learns users’ personal style through analyzing shopping behavior, and offers outfit recommendations, complimentary products, and personalized offers to the lemmings who think fashion is something worth spending meaningful amounts of money on.
FalconAI’s second offering, the SenpAI platform, uses  algorithms to help video game players by monitoring and analyzing their performance and then recommending areas of improvement. SenpAI is currently available for Defense OThe Ancients 2 (DOTA2), and the response has been so positive that they have now set their eyes on League OLegends (LOL) which leads e-sports revenues generating an estimated $1.4 billion per year. (DOTA2 is responsible for the largest competition prizes.) With top e-sports players treated like rock stars (and paid accordingly), SenpAI offers tangible value to the many aspiring gamers out there.
Founded in 2014, Istanbul startup Vispera has raised $1.9 million to develop computer vision solutions for the Fast-Moving Consumer Goods (FMCG) sector. The startup’s algorithms analyze smartphone photos of shelves, coolers, and cabinets to determine how products are placed, if their branding and price tags are visible, and which products are missing. These retail audits benefit retailers as they help manage stock and suppliers by ensuring they receive the allotted shelf space and necessary promotional materials.[…]

Artificial intelligence could help data centres run far more efficiently

Artificial intelligence could help data centres run far more efficiently

A novel system developed by MIT researchers automatically ‘learns’ how to schedule data-processing operations across thousands of servers.

MIT system ‘learns’ how to optimally allocate workloads across thousands of servers to cut costs, save energy.
A novel system developed by MIT researchers automatically ‘learns’ how to schedule data-processing operations across thousands of servers — a task traditionally reserved for imprecise, human-designed algorithms. Doing so could help today’s power-hungry data centres run far more efficiently.
Data centres can contain tens of thousands of servers, which constantly run data-processing tasks from developers and users. Cluster scheduling algorithms allocate the incoming tasks across the servers, in real time, to efficiently utilise all available computing resources and get jobs done fast.
Traditionally, however, humans fine-tune those scheduling algorithms, based on some basic guidelines (‘policies’) and various trade-offs.

Code algorithm to get certain jobs done quickly

They may, for instance, code the algorithm to get certain jobs done quickly or split resource equally between jobs. But workloads — meaning groups of combined tasks — come in all sizes.
Therefore, it’s virtually impossible for humans to optimise their scheduling algorithms for specific workloads and, as a result, they often fall short of their true efficiency potential.
The MIT researchers instead offloaded all of the manual coding to machines. In a paper being presented at SIGCOMM, they describe a system that leverages ‘reinforcement learning’ (RL), a trial-and-error machine-learning technique, to tailor scheduling decisions to specific workloads in specific server clusters.
To do so, they built novel RL techniques that could train on complex workloads. In training, the system tries many possible ways to allocate incoming workloads across the servers, eventually finding an optimal trade-off in utilising computation resources and quick processing speeds. No human intervention is required beyond a simple instruction, such as, ‘minimise job-completion times’.
Compared to the best handwritten scheduling algorithms, the researchers’ system completes jobs about 20 to 30 percent faster, and twice as fast during high-traffic times.
Mostly, however, the system learns how to compact workloads efficiently to leave little waste. Results indicate the system could enable data centres to handle the same workload at higher speeds, using fewer resources.

‘Automatically figure out which strategy is better than others’

“If you have a way of doing trial and error using machines, they can try different ways of scheduling jobs and automatically figure out which strategy is better than others,” says Hongzi Mao, a PhD student in the Department of Electrical Engineering and Computer Science (EECS).
“That can improve the system performance automatically. And any slight improvement in utilisation, even one per cent, can save millions of dollars and a lot of energy in data centres.”[…]

Tuesday, August 27, 2019

Firefighters turn to AI for faster evacuations

UOW trials smart CCTV software.

Firefighters turn to AI for faster evacuations
New software under development in Australia is set to help firefighters find people faster during an emergency by using artificial intelligence to analyse a building’s CCTV feed.
Developed by EVisuals at ANSTO’s nandin incubator and the University of Wollongong’s SMART Infrastructure Facility, the ‘Incident’ software was built using Nvidia’s hardware and artificial intelligence capabilities.
A deep neural network was trained to identify building occupants and differentiate them from first responders in real-time using footage coming from a building’s CCTV system.
It can also identify people with mobility issues, directing firefighters to people who need more help during an evacuation procedure.
“Confirming the location of building occupants is currently reliant on verbal reports from building wardens and extensive searches by emergency services,” EVisuals managing director Matt Lynch said.
Manual searches are not only time-consuming, but expose firefighters to dangers conditions for longer. They can also be slowed down due to decreased visibility from flames, smoke and debris.
“Incident gives emergency responders the critical information that they require about the building, and its occupants, in real time.”
SMART director, Professor Pascal Perez, said the Incident software will be trialled over a period of six months, building on the facility’s edge computing computing research.
SMART researcher, Dr Johan Barthelemy, added that the end goal of the system will be to increase the efficiency of the system until it can be reliably deployed on the edge of a network.
Nine News reported EVisuals will then seek to commercialise the software, focusing on the aged care and hospital sectors where the logistics of an evacuation are complicated by the higher concentration of people with reduced mobility.

Monday, August 26, 2019

Four Ways Data Science Goes Wrong and How Test-Driven Data Analysis Can Help

If, as Niels Bohr maintained, an expert is a person who has made all the mistakes that can be made in a narrow field, we consider ourselves expert data scientists.  After twenty years of doing what’s been variously called statistics, data-mining, analytics and data-science, we have probably made every mistake in the book—bad assumptions about how data reflects reality; imposing our own biases; unjustified statistical inferences and misguided data transformations; poorly generalized deployment; and unforeseen stakeholder consequences.  But at least we’re not alone.
We believe that studying all the ways we get it wrong suggests a powerful “test driven” approach that can help us avoid some of the more egregious mistakes in the future.  By extending the principles of test-driven development, we can prevent some errors altogether and catch others much earlier, all without sacrificing the rapid, iterative, “train of thought” analysis cycle that is fundamental to successful data-science.
Let’s step back.  The successful data scientist applies the traditional scientific method to draw useful conclusions about some phenomenon based on some (perhaps big!) data that reflects it.  Although non-practitioners often view data analysis as a monotonous, mind-numbing process where the analyst feeds in the input data, turns a crank, and produces output, in reality there are many choices to be made along the way, and many pitfalls to catch the unwary.   The “art” of data science is about choosing “interesting questions” to ask of the data: the hypotheses demanded by the scientific method.  These hypotheses are tested, revised and refined, and ultimately lead to conclusions or analytical results: typically charts, tables, predictive models and the like.
Once the analysis is complete, we’re typically left with some kind of software artifact—an “analytical process” that involves a set of steps that transform the input data into well-defined outputs.  Often some or all of that process is later automated and generalized so that updated results can be generated as new data are collected or updated.   But the manner in which an analytical process is created is quite different from how a traditional software program is built.   Unlike a software program, where at least in principle we can specify the desired outcome before we begin, it’s precisely that specification—of the analytical results—that is the objective of data analysis.  We are effectively defining our specification and the software that delivers it simultaneously.  Not only that, the ultimate value of the analysis is critically dependent on how accurately our understanding of the input data and output results relate to the original phenomenon of interest.
Analytical processes can go wrong in all the same ways any piece of software can go wrong, such as crashing or producing obviously incorrect output. Data analysis also offers a plethora of new ways to fail. Insidious errors creep in when our “specification” itself is wrong.  Our process can run correctly in the sense of producing the right kind of output, and not being obviously wrong, but cause us to draw completely invalid conclusions.   These specification errors are often not discovered until much later, if at all. Similarly our process may fail in unexpected ways when presented with new or updated data.
As shown below, we identify four broad categories of analytical process failure, although in practice such a classification will never be perfectly precise:  Anyone familiar with software development will know that in many cases bugs can be (and are!) converted into features simply by documenting the “erroneous” behaviour as part of the spec.
Click to Enlarge
1. Errors of Implementation. The most basic kind of error is where we just get the program wrong—either in obvious ways like multiplying instead of dividing—or in subtler ways like failing to control an accumulation of numerical errors (e.g. a Patriot Missile failure during the first Gulf War that resulted in more than 100 casualties). The twist with data analysis is that it might be quite hard to detect that the results are wrong, especially if they are voluminous.
2. Errors of Interpretation.  Our analysis always depends on the data we consume and produce being correct in two senses: the values must be accurate and they must mean what we think they mean. Even when the first is true, often our misunderstandings and misinterpretations obscure our picture of reality, leading us unknowingly to draw fallacious conclusions.  For example, despite much initial hype Google Flu doesn’t accurately forecast disease outcomes based on search behavior since most people don’t have a good understanding of flu symptoms. Even the questions we ask can be the wrong questions, as Tukey observed:
“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question” – J. Tukey, The Future of Statistical Analysis
3. Errors of Process.  Applying statistical methods or inferences correctly often require that specific assumptions be satisfied. Data transformations often have unpredictable consequences in the face of unexpected data (missing or duplicate values being a common problem) and can lead to unjustifiable results.  There are several great collections of how statistics are done wrong, and the spectacular failure of the Mars Orbiter due to incompatible units is a canonical illustration of failure due to different units being mixed used without appropriate conversions.
4. Errors of Applicability. An ad hoc approach is common during initial data exploration. But this can result in an analytical process that is overly specific to the initial dataset, making it difficult to repeat or apply to updated data with slight differences.  Although this sometimes results in easily detectable “crashes”—such as when an unexpected value appears or is missing—it can also lead to otherwise inappropriate conclusions in production. The best known examples of this are overfitting a training dataset, leading to models that don’t perform well in production (e.g. Walmart’s recommendation engine failure), but even analyses not involving predictive modelling often “wire in” assumptions and values, making the analytical process of limited applicability.
So what can be done?  Several years ago, as we began to realize the benefits of Test Driven Development in our traditional software development, we asked ourselves whether a similar methodology could inform and improve our approach to data analysis.  We believe that the principles of test-driven development provide a promising approach to catching and preventing many of these kinds of errors much earlier.  This might well require improvements to the tools we use in order to preserve the speed and flexibility of ad hoc analysis that we’ve come to expect:
  • Traditional test-driven development approaches can be adopted directly to specify (at least post hoc), verify, refactor and automate the steps in our analytical process.  Tests can prove that input data matches our expectations, and that our analysis can be replicated independently of hardware, parallelism, and external state such as passing time and random seeds. The obstacles to wider adoption of this are the difficulty of following the “test-first” ethos of much test-driven development, together with the lack of good tool support for testing much beyond scalar base types. We have a number of ideas about how tool support can be greatly enhanced, and think a more analysis-centric methodology would also help.
  • It seems likely (though not certain) that a richer type system could allow us to capture the otherwise implicit assumptions we make as we perform data transformations.  Such operations commonly treat our data as undifferentiated lists or matrices of basic data types, losing significant context.  For example, consider a table of customers, and another containing their transactions, linked by a customer key.  A traditional database-like approach is fundamentally unable to distinguish the fact that although the  average transaction value  for a customer with no transactions is undefined, their total transaction value should be zero. Richer metadata, including formatting and units would allow tools to apply dimensional analysis ideas to prevent silly mistakes and present output in forms less prone to misinterpretation.
  • Just as programmers developed lint and PyFlakes for checking for clear errors and danger signs in C and Python code respectively, we can begin to see the outline of ideas that would allow an analytical equivalent. Wouldn’t that be something?
We still just beginning to explore these ideas, but they are already delivering tangible value in production environments.  If you’d like to learn more, or share your own experiences, please join the conversation atwww.tdda.info and @tdda.

Racial bias in a medical algorithm favors white patients over sicker black patients

A widely used algorithm that predicts which patients will benefit from extra medical care dramatically underestimates the health needs of...