What is a Data Hacker?

What is a "Data Hacker?"

There's no simple answer. I think of a hacker as someone who rejects conventional ways of doing things, and is simultaneously driven by an insatiable desire to understand how things work. Someone who thrives on deconstruction, and seeks to understand the inner workings of whatever they choose to focus on.

Brian Harvey - a distinguished professor of computer science at the University of California, Berkeley - wrote a research paper in 1985 (appended in 1986) - in which he described a culture of computer scientist "hackers" at M.I.T.

Harvey distinguished between computer programmers and scientists who were involved in their craft primarily due to career aspirations, and those who had discovered it as a hobby and were in it simply because it was fun. This latter group, the M.I.T. students dubbed, "hackers." Thus, the term was born. 1 But, what exactly does it mean to be a "hacker?"

Harvey describes hackers as aesthetes; a philosophical term invented in the 18th-century and closely aligned with the word aesthetics. An aesthete is a person who is fascinated with the artistic beauty of structure. Hackers are not only obsessed with the data they work with; they are equally obsessed with the semantics of their work. It is not enough to know what. They must know why. Their deeds must have purpose. And it is this sense of purpose which empowers them to excel. It is this thought process that unlocks their creative energy and enables them to deduce connections others miss, and makes them naturally good at drawing context from disparate information.

As Harlan Harris, opines in a 2015 blog post, Analyzing the Analyzers: 2

"The true hacker can't just sit around all night; he must pursue some hobby with dedication and flair."

Hackers are people who enjoy deriving bits of information from different contexts and mashing them together (hacking, per se) to form a congruent whole. They takes bits and pieces of various things and aggregate them, transform them, into something new. Often, the end result has an entirely different purpose than the original product.

Data Hackers

More poignantly, what does it mean to be a hacker in the context of data? What is a "data hacker?" Harris divides data science practitioners into four (4) categories based on their skill sets and interests.

Harlan Harris' suggested Self-ID Group names of data scientists, based on their behavior.

Anyone who has spent a modicum of time looking into Data Science quickly discovers it is a very wide and deep field. Harris groups data scientists into distinct groups, based on how they approach problem solving and how they have approached their career.

Using this motif, Harris refers to what he calls "Data Creatives" as "Hackers." In his view, Data Hackers are generalists. Their expertise tending to be overweight in big data, machine learning, programming, and statistics, and less focused on the business and hardcore math skills associated with the profession. A "Jack of All Trades" from a development standpoint, and a creative type of person based on their approach to problem solving. As one [hacker] data scientist described (Kandel, 2):

"I’m not a DBA, but I’m good at SQL. I’m not a programmer but am good at programming.
I’m not a statistician but I am good at applying statistical techniques
"

Not a One-Trick Pony

"Enterprise Data Analysis and Visualization: An Interview Study," is a white paper published in 2012 by four researchers at Stanford University. Headed by Sean Kandel, the group interviewed and shadowed 35 data analysts and documented their work behaviors and thoughts on the current and future states of their profession. 3

Kandel's paper grouped their interview subjects into three (3) distinct groups describing their approach to their roles: Application Users, Scripters, and Hackers. The "hacker" group embodied a unique blend of proficiency, self-sufficiency, and tenacity.

Kandel and his co-authors observed:

"Hackers were the most proficient programmers of the three groups and the most comfortable manipulating data."

"... hackers, often used multiple tools and databases to complete their tasks."

Hackers Excel at Solving Difficult Problems

According to Kandel and his colleagues, due to their inherently curious nature, hackers are the best suited data scientist archetype to solve complex problems...

"Hackers faced the most diverse set of challenges, corresponding to the diversity of their workflows and toolset [sic]."

"Hackers typically had the most diverse and complex workflows of the three archetypes, characterized by chaining together scripts from different languages that operate on data from distributed sources."

Hackers strive to be thorough...

Hackers were more likely to acquire a new data source outside of the organization’s data warehouse and integrate it with internal data.

Kandel wrote,

"... because this group relied less on IT staff for completing certain tasks, they spent more time in early-stage analytic activities prior to modeling." And, they were less likely to present their findings to stakeholders until they were certain the data had been thoroughly vetted."

and....

"... hackers viewed tools that produce interactive visualizations as reporting tools and not exploratory analytics tools. Since they could not perform flexible data manipulation within visualization tools they only used these tools once they knew what story they wanted to tell with the data." [emphasis added]

Conclusion

There, you have my contribution to the definition of a "data hacker." The bottom line is it's someone who enjoys parsing large amounts of data to discover correlations and glean insights than were unknown before. A data hacker is someone who thrives on aggregating, cleaning, wrangling, and transforming information.

Works Cited

1 Harvey, Brian. What Is a Hacker? University of California (Berkeley), 1986, https://people.eecs.berkeley.edu/~bh/hacker.html.

2 Harris, Harlan. Analyzing the Analyzers. O'Reilly, 4 May 2015, https://www.oreilly.com/ideas/analyzing-the-analyzers.

3 Kandel, Sean; et al. Enterprise Data Analysis and Visualization: An Interview Study. Stanford University. (2012). http://vis.stanford.edu/files/2012-EnterpriseAnalysisInterviews-VAST.pdf