Wednesday, May 16, 2012

ARE2012 Keynote: Serendipity

Slide 1:

Serendipity is another way to say good luck. It's a concept and belief that something fortuitous occurs from a confluence of factors

Slide 2:

Waldo Tobler's First Law of Geography states that things near you have more influence that things farther away. This idea has been applied in a number of situations.

Hecht, Brent and Emily Moxley. "Terabytes of Tobler: Evaluating the First Law in a
Massive, Domain-Neutral Representation of World KnowledgeCOSIT'09 Proceedings of the 9th international conference on Spatial information theory, 2009,  pp 88-105.

Sui, D. "Tobler’s First Law of Geography: A Big Idea for a Small World?" Annals of the Association of American Geographers, 94(2), 2004, pp. 269–277.

Tobler, W. , "On the First Law of Geography: A Reply," Annals of the Association of American Geographers, 94, 2004, pp. 304-310.

Slide 3:

Will Wright recently gave a talk at O'Reilly's Whereconf titled "Gaming Reality." One of his points was related to Tobler's First Law of Geography, things that are closest are most likely to be of interest. 

Shute, T. Ugotrade. Will Wright, “Gaming Reality,” Where 2012

Slide 4:

Proximity can be measured along many different dimensions: spatial, temporal, social network, and conceptual.

Slide 5:

Developers are building mobile applications based on these ideas. For example, GeoLoqi implements geofencing to notify users of events when inside a defined area. Forecast is another application which broadcasts when and where you will be to your friends, increasing the likelihood that you will meet. Other applications can notify of events and sales that occur as you pass through an area.

Brownlee, J. This Creepy App Isn’t Just Stalking Women Without Their Knowledge, It’s A Wake-Up Call About Facebook Privacy. Cult of Mac, March 30, 2012.
Huffington, A. GPS for the Soul: A Killer App for Better Living. Huffington Post04/16/2012.

Slide 6:

Social media, such as Twitter, have been analyzed to determine states of emotion and mapped by Dan Zarella. This just one example of how data can be used to find a person's proximity to an emotion based on location and time.

Zarrella, D. Using Twitter Data to Map Emotions Geographically. The Social Media ScientistMay 7th 2012

Slide 7:

Connections between people in the form of social networks or social graphs provides a rich source of data for measuring conceptual phenomena. For example, Klout declares that it is a measure of influence, LinkedIn can be a measure of a person's professional sphere, Twitter can  while Pintrest can reflect the material culture of a person or a group.

Stevenson, S. What Your Klout Score Really Means. WiredApril 24, 2012

Slide 8:

Will Wright postulated that there are at least 50 different dimensions where proximity creates a value gradient. The closer to a person, the greater the value along the value gradient. These gradients can be emotions, communities of interest, school affiliations, or any number of factors that can influence a person's behavior and choices. By bringing all these dimensions to bear on a person, it could be possible to build game dynamics that take advantage of physical world behaviors.

Slide 9:

Measurement of the value gradient is the first step in engineering serendipity. There are a number of ways of quantifying the value gradient, but proximity is often modeled on a network structure. Nodes in the network represent people and possible dimensions of interest and the connections (or links) between nodes can measure the gradient.

Slide 10:

Will Wright suggested that Central Place Theory as one model of understand the effects of proximity. It is a classic geographic model proposed by Walter Christaller for explaining the hierarchy of places. When applied to influencing serendipity, the concepts of threshold and range are key to using the model to measure the influence of proximity. Threshold is the minimum interaction of a dimension needed to influence a person, whereas range is the maximum distance a person will 'travel' to acquire something.

Dempsey, C. Distance Decay and Its Use in GIS. GIS Lounge3/15/12.

Slide 11:

There are a number of ways to measure the effects and or the importance of links. Google's page rank algorithm is perhaps the most famous. Page rank indicates the importance of a page based on number of incoming links. Another form of link analysis used by the intelligence community focuses on the transactions between people, organizations, places and time as exemplified by Palantir software.

Holden, C. Osama Bin Laden Letters Analyzed. Analysis Intelligence. May 4, 2012.

Holden, C. From the Bin Laden Letters: Mapping OBL’s Reach into Yemen. Analysis Intelligence. May 11, 2012.

Page, Lawrence and Brin, Sergey and Motwani, Rajeev and Winograd, Terry (1999) The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

Slide 12:

Ultimately, all these measures of proximity are attempting to answer this question, "If you friend Joey jumped off a bridge, would you jump?" I.e., would you jump off a bridge because everyone is doing it (social influence/contagion) or would you jump because you are similar to Joey (homophily). A recent paper, Homophily and Contagion are Generally Confounded in Observational Network Studies, posits both the subject and the answer in it's title. 

Shalizi,C and A. Thomas. Homophily and Contagion are Generally Confounded in Observational Network StudiesSociological Methods and Research, vol. 40 (2011), pp. 211-239.

Slide 13:

The comic XKCD manages to summarize the result in a single panel. We don't know.

Munroe, R. Cat Proximity, xkcd.

Slide 14:

The maxim, "Models are wrong, but are useful" has been a truism in research. The idea that models are not only wrong, but that research can be successful without them is starting to gain currency in the era of Big Data. Access to very large datasets and the capability to manipulate them inexpensively is changing how research is performed.

Allen, R. Data as seeds of content. O'Reilly Radar. April 5, 2012.

Slide 15:

With large numbers on our side, petabytes or even yottabytes of data can reveal patterns not possible with sampled data.

Shaw, A. Big Data, Gamification and Obama 2012. OWNI.EU. April 4, 2012.

Slide 16:

Flip Kromer of Infochimps illustrates how a preponderance of data leads us to determine the boundaries of places called Paris and which location is the one used in a particular context

Kromer, F. On Being Wrong In Paris: Finding Truth in Wrong Answers. The Infochimps Blog. Dec 1, 2011.

Slide 17:

Third party agents are continuously collecting information about people from social media, social networks, and ecommerce. This provides a wealth of data about people for a third party perspective. In addition, the quantified self is a concept where individuals document every aspect of their lives in order to optimize their day to day interactions.

However, Goodhart's law stipulates that any indicator used to influence a particular behavior will decrease the usefulness of that indicator. In other words, users will game the system and degrade the quality of the information in order to achieve the objective

Doctrow, C. Goodhart's Law: Once you measure something, it changes. April 29, 2010

Sharwood, S. Social networks breeding spatial junk. The Register. March 6, 2012.

Slide 18:

There is an emerging an corollary concept of the quantified self. Rather than a continuous collection of data there is an alternate of source of data that reflects information selected and shared but not for the purposes of participating in social networks, i.e. a view to a person's internal life. For example, Amazon collects highlighted phrases from Kindle users as well collecting wish lists which represent material culture.

Carrigan, M. Mass observation, quantified self, and human nature. April 19, 2012.

Currion, P. The Qualified SelfThe Unforgiving Minute. November 30, 2011.

Slide 19:

To bring it back to serendipity, perhaps it's time to re-evaluate how we understand how multiple factors affect an individual's choices. Models based on physical properties such proximity may lack the nuance necessary to explain a behavior. Simply creating a confluence of events within many possible proximal dimensions may not be enough to explain or influence. However, a new alternative is possible through the use of big data and the tools of machine learning and algorithms to describe behavior. We should harness these tools to better understand the factors that affect serendipity and let go of Newtonian models that reduce the rich interplay of social factors.

No comments:

Post a Comment