Data mining looks for patterns in data, typically in a (semi-)automated fashion with ‘intelligent’ computer software. Examples of data mining include seeking patterns in genomic and demographic data of cancer patients (http://www.vuse.vanderbilt.edu/~dfisher/Papers/RuleBasedLearning.html) and searching for patterns in the observed behavior of students working math problems. Discovered patterns are ‘gems’, and thus the mining metaphor, because patterns can be exploited to the good, we hope. For example, discovered patterns in genomic data can be used to better target treatments (e.g., chemo therapies) to patients exhibiting different patterns, while discovered patterns in student problem-solving data can guide educational remediation. Data mining can certainely benefit medicine, environment, education, and business, but there are ethical concerns/cautions concerning data mining too — e.g., genomic patterns can be used to fit patients to treatments, but like patterns can predict risk too, and do we want insurance companies to pay for tests that will identify personal patterns, while trusting them not to use this knowledge to exclude coverage?
I first did PERSONAL data mining, albeit “by hand”, after being diagnosed with diabetes. The data in this case consisted of (a) blood glucose readings, at least 8 a day and spaced appropriately, (b) morning and evening weighings, (c) food label data to include calories, carbs (total and broken down into sugars, fiber, others), proteins, fats, etc of what I was eating and timestamps of when I was eating it, and (d) exercise timestamps and intensities. This all seemed a bit over-the-top to some, but not my doctor at the time, Ben, who loved it, and I quickly converged on good glucose levels without medication. Some interesting anecdotes about this whole process are that the data collection was informed by knowledge — for example, glucose readings were spaced in a manner guided by my research into when glucose was likely to peak after a meal, and this illustrates the more general point that data collection/mining typically isn’t done in a conceptual vacuum. Also my processes of data collection, pattern discovery, and responses such as changes in diet, exercise, etc, were more tightly coupled than these processes are in most other data mining contexts. That first month or two was highly beneficial and I still reap the benefits — when I saw my doctor in Arlington a few months ago, she said that I was the best controlled diabetic she’d ever seen, and I beamed of course
. Later I also got an inkling into the allure, probably felt by body-builders and perhaps those with eating disorders, of being in control of my body. In my case the feeling of control felt mainly healthy (because control is critical in effective management of diabetes), but there was an allure to the control that went beyond the strictly healthy — I remember hitting 140 lbs, high-school wrestling weight, and thinking “135 would be easy”, and the goal had appeal, I think, only from a feeling of control when other life circumstances felt out of control. That said, one thing that Doc Ben advised me upon diagnosis was “get skinny” and he wasn’t talking “middle-age skinny”.
My ‘data mining’ of glucose, diet, weight, and exercise data was the topic of an invited talk back in the late 90’s, as well as a (declined) proposal to customize ‘generic’ (i.e., population-wide) mathematical models of glucose/insulin dynamics to fit an individual’s particular dynamics based on individual data — it’s a topic that I want to return to again, and now with renewed excitment.
My interests in personal-data mining have continued and recently I found data from iTunes. When I discovered the Top-25 playlist on my iPod nano, I knew that certain data had to be recorded, and sure enough its easy to get to on iTunes by viewing my music library in ‘list’ format. An important attribute is ‘Play Count’ — the total number of times the song has been played on my iPod as of the last synchronization. It is possible to reset (zero-out) this attribute, which could compromise data accuracy, and I don’t know how long the song has to play before its counted as played — I’m certain for example that if I forward to the next song with 5 seconds REMAINING on the current song then the current song is still counted as played, and likewise I’m guessing that if a song plays only two seconds before I forward to the next, its NOT counted as played, but rather counted as SKIPPED (total times a song is skipped is another attribute), but these are just guesses — these and other questions might be answerable from research on the Web, patent records, or other resources. Generally, its often the case that data collection, interpretation, and coding are not trivial.
Its not hard to spot at least one “pattern” if you look at my top 10 (out of about 216 songs) — even the Genius recommender system can catch the one I have in mind:
Song Name Artist Play Count
1 I Know You Rider Hot Tuna 193 (off Classic Hot Tuna Electric)
2 I Know You Rider Half Day Bluegrass Band 80
3 The Touch of the Master’s Hand Laurie Lewis 78
4 Under God’s Light Rare Earth 78
5 I Know You Rider (Live In Paris) Grateful Dead 76
6 Tears Of A Clown The English Beat 67
7 Downtown Neil Young 65
8 I Know You Rider (1966) Grateful Dead 62
9 Girlfriend Matthew Sweet 61
10 Well…All Right Buddy Holly 57
The count of 193 is not a typo, and it would be tempting to call it an “outlier”, which is something so anomolous that it should be regarded as outside the scope of analysis, but in this case of looking at song listening behavior omitting this point would be like trying to understand solar system behavior while disregarding a black hole that was approaching the heliosphere because it was so much more distant than our other heavenly neighbors. And, as it turns out, this 193-count song is an integral part of a pattern, and thus not an outlier at all. When I rank the songs from most played to least played and plot them, I get a graph very like in shape to the one in the first figure below. The first graph actually shows the ‘average number of plays per day’, which isn’t an attribute that iTunes actually gives me, but it was an attribute I computed — after all, it might be that a song is played more only because I’ve owned it longer (which caries information to be sure), but by looking at the per-day average of each song since my ownership of it began I am ‘normalizing’ it. iTunes tells me the total number of plays, and the date of acquisition in my library (by purchase or copy from CD), and Microsoft Excel includes a function, DAYS360, for approximating the number of days between two dates — in this case the current date and the date of acquisition. The graph of average plays-per-day shows some of the same songs at the top, including my musical black hole, but other songs are now at the top after compensating for days owned. And of course, this new attribute reflects other biases, like my tendancy to play a song more often earlier in ownership — I’m sure that I’d verify this pattern by looking at my iTunes library statistics over time — and there is much activity on many forms of temporal data mining.
(CLICK ON THE IMAGE)

The second figure below shows a curve fit to the data on plays per day — in this case the curve is a power function and it fits the data well (and better than other common functional forms, including logarithmic). A power function, suffice it to say, is a functional form characterized by rapidly diminishing returns. I first learned about power functions in Tarow Indow’s UCI class on human cognition and memory, where behaviors characterized by rapidly diminishing returns are ubiquitous. For example, if I practice some new behavior, I become competent rapidly, and continued practice brings further improvement, though in smaller and smaller increments (or decrements) — this is called the “power law of practice”. But power laws manifest in many other contexts too, like rate of memory retrieval from a given category and the learning behavior of machine learning programs, which Lewis Frey and I published on years ago (http://www.vuse.vanderbilt.edu/~dfisher/Papers/ModelingTree.pdf).

I looked at skips too, notably skips per day (figure 3, in red), ordered from most to least, and asked whether more frequently listened to songs tended to be more rarely skipped songs, with the apparent answer being ‘NO’ (see figure 4), but I think that there are varying reasons for the relative lack of correlation. For example, there are some songs that I have listened to rarely, but that I have never skipped (Skip Count = 0) and I may never skip them. Also, there is a Top-25 list that is maintained on my iPod, and there is this musical black hole that I choose to listen to often, and thus the second most listened to song (and reliably second on my Top-25 playlist), is among my most frequently skipped songs! This latter example highlights the importance of the iPod nano interface and functionality in getting the data that I’m getting. The iPod shuffle would show a much more linear relationship (of near zero slope) I expect, because my ability to chose next song would be much more limited (though I am probably ignorant of functions of the shuffle like an ability to create playlists). And on the side of greater functionality, if I could enter a song at a middle point, I might listen to some songs even more, such as Under God’s Light (number 4 on my list of total plays), with a final instrumental section that I like better than say that of Stairway to Heaven, and ranking right up there with Black Magic Woman, Freebird, and Green Grass and High Tides, and besides Under God’s Light was on the first album I ever bought, One World, so it was imprinted on me early (note the cool cover, particularly appealing to 13 year old, but heck it still is, who am I fooling: http://www.amazon.com/One-World-Rare-Earth/dp/B000K7BPYI#moreAboutThisProduct. And the first customer review is right on).


I am left with a lot of questions. I wonder whether the existence of a top-25 list that I often use causes the diminishing-returns characteristic to be exaggerated or lessoned? It would be interesting to look more at skips, normalize the “raw data” in other ways, guess as to what iTunes might do with the data they receive, speculate on real “genious” music recommender systems, and elaborate on mining for temporal patterns. In this latter case, I wonder, for example, whether the rate that I listen to a given song falls off according to a power law — maybe in some cases, but in others like the black hole, it might fall off linearly (and it is falling off) … but if I had the time to collect and look at the data, I wouldn’t have to guess.
Finally, there appear to be power laws of musical listening behavior, and possibly other manifestations of the soul, almost certainly aspects of friendship. But not only are there important differences in interface (e.g., nano factionality versus shuffle versus radio) that determine the data, but there are individual personal differences too. The returns (as in rapidly-diminishing returns) in cases like music and friendship include sadness and joy. There are many musical expressions that bring these returns, and its fair to say that in a radio-listening context I would not station-hop past the large majority of the songs on my iPod with say greater than 10 plays (most of my single-digit played songs were those I downloaded for my friend Vivian when she was in hospice, but some of this latter set I have adopted as my own), but the additional returns I get beyond those of higher ranking seem to fall off rapidly in an environment in which I have a choice, but I don’t think its (necessarily) because they are of lesser importance to me, but some/many simply fill a more specialized niche. And here is an aside — if power laws, which seem inherently unbalanced to me, come with mechanisms that allow greater individual choice, then what are the implications for sustainable decision making?
Clearly, in the diabetic-related data case, I learned actionable patterns of great value. What have I learned from the iTunes-data mining case beyond what I already knew? In some ways nothing (yet) beyond some details on numbers of plays and skips, and the magnitudes of some of these were surprising, and I suppose the power law in this context was a surprise too. Beyond this, it has caused me to reflect and it highlights certain things, like the cluster of lesser-played but zero-skipped songs (zero is a very special number), which nobody but me could possibly understand, unless they had done their own mining and reflecting to understand what such a cluster meant to them, which might give them just enough insight into me to ask me the question — what does this cluster mean for you?
(Plots and curve fits were done by open source software Graph 4.3 (http://www.padowan.dk/graph/)).
All 216
Song Name Artist Play Count
1 I Know You Rider Hot Tuna 193
2 I Know You Rider Half Day Bluegrass Band 80
3 The Touch of the Master’s Hand Laurie Lewis 78
4 Under God’s Light Rare Earth 78
5 I Know You Rider (Live In Paris) Grateful Dead 76
6 Tears Of A Clown The English Beat 67
7 Downtown Neil Young 65
8 I Know You Rider Grateful Dead 62
9 Girlfriend Matthew Sweet 61
10 Well…All Right Buddy Holly 57
11 Fortunate Son U2 55
12 Save It For Later The English Beat 54
13 Beautiful Day U2 54
14 There Goes Another Love Song The Outlaws 53
15 Celebrate Sam Bush 52
16 Castanets Alejandro Escovedo 51
17 In a Big Country (Radio Edit) Big Country 48
18 Gloria Van Morrison with Them 47
19 Praise You Fatboy Slim 45
20 Down On Me (Live) Janis Joplin 45
21 Get Ready (21 min) Rare Earth 43
22 Cinnamon Girl Type O Negative 42
23 In God’s Country U2 40
24 Born On the Bayou Creedence Clearwater Revival 37
25 Our Lips Are Sealed The Go-Go’s 37
26 Good King Wenceslaus Melanie 36
27 Scarlet Begonias Grateful Dead 35
28 You Wreck Me Tom Petty 34
29 Time Has Come Today Joan Jett 33
30 Spirit In the Sky Plumb featuring Mikeschair 33
31 It’s A Long Way To The Top (If… AC/DC 32
32 A Mighty Fortress Is Our God Mormon Tabernacle Choir 32
33 She’s a Mystery to Me Roy Orbison 32
34 New World Man Rush 32
35 Great White Buffalo (Live) Ted Nugent 32
36 My Love Will Not Change The Del McCoury Band 31
37 Hoedown (Taken from Rodeo) Emerson, Lake & Palmer 31
38 John Henry Harry Belafonte 31
39 Where Are We Runnin’? Lenny Kravitz 31
40 Magic Carpet Ride Steppenwolf 31
41 Jessica (Single Version) The Allman Brothers Band 30
42 Green Grass And High Tides The Outlaws 30
43 Limelight Rush 30
44 Best of Both Worlds Van Halen 30
45 Johnny Strikes Up The Band Warren Zevon 30
46 Werewolves Of London Warren Zevon 30
47 I Can Love You Better Dixie Chicks 29
48 Tuff Enuff The Fabulous Thunderbirds 29
49 I’m a Believer The Monkees 29
50 Finest Worksong R.E.M. 29
51 Lucky Never Had It So Good Ashley Cleveland 28
52 Ramble Tamble Creedence Clearwater Revival 28
53 In the Evening Led Zeppelin 28
54 Rocky Top The Osborne Brothers 28
55 Every Picture Tells a Story Rod Stewart 28
56 Time To Start Blue Man Group 27
57 1952 Vincent Black Lightning The Del McCoury Band 27
58 Grey Seal Elton John 27
59 Cold Rain and Snow Grateful Dead 27
60 The Safety Dance Men Without Hats 27
61 I Know You Rider Phil Lesh & Friends 27
62 Desire U2 27
63 Hush Deep Purple 26
64 You Can Close Your Eyes (Live) James Taylor 26
65 Somebody to Love Jefferson Airplane 26
66 Someone To Love Rare Earth 26
67 What’d I Say Rare Earth 26
68 What I Like About You The Romantics 26
69 My Maria B.W. Stevenson 25
70 Stage Fright The Band 25
71 Twist and Shout David Lindley & El Rayo-X 25
72 Shapes of Things Jeff Beck 25
73 Express Yourself Madonna 25
74 Jet Airliner (Live) Steve Miller Band 25
75 867-5309/Jenny Tommy Tutone 25
76 Magic Bus The Who 25
77 Thank You Alanis Morissette 24
78 Down In The Hollow Leftover Salmon 24
79 Cripple Creek Leo Kottke 24
80 What A Crying Shame The Mavericks 24
81 Tennessee Stud The Nitty Gritty Dirt Band 24
82 Girl of the North Country Sam Bush 24
83 Rider The Seldom Scene 24
84 Kentucky Woman Deep Purple 23
85 Introduction/Darlin’ Cora Harry Belafonte 23
86 Free Bird Lynyrd Skynyrd 23
87 Gloria Patti Smith 23
88 I Just Want to Celebrate Rare Earth 23
89 Excitable Boy Warren Zevon 23
90 I’m So Glad Cream 22
91 Playing in the Band Grateful Dead 22
92 The Golden Road (To Unlimited… Grateful Dead 22
93 La Bamba Los Lobos 22
94 Magic Key Rare Earth 22
95 Marianne Stephen Stills 22
96 White Rabbit Blue Man Group Feat. Esthero 21
97 But Anyway Blues Traveler 21
98 Crossroads (Live At Winterland) Cream 21
99 Mercury Blues David Lindley 21
100 Love Is A Long Road The Del McCoury Band 21
101 Hocus Pocu (US Single) Focus 21
102 Johnny B. Goode Grateful Dead 21
103 Angel to Be Sam Bush 21
104 Hello Mary Lou The Seldom Scene 21
105 Jungle Love Steve Miller Band 21
106 Lawyers, Guns And Money Warren Zevon 21
107 One Deirdre Jenkins 20
108 Best Friend The English Beat 20
109 Would I Lie to You? Eurythmics 20
110 Stand R.E.M. 20
111 Poor Poor Pitiful Me Warren Zevon 20
112 I Feel Love Blue Man Group Feat. Venus H 19
113 Time Has Come Today The Chambers Brothers 19
114 Werewolves Of London David Lindley & El Rayo-X 19
115 Happiness (I’m So Glad) Deep Purple 19
116 Hocus Pocus (Long) Focus 19
117 Who Do You Love [Live] George Thorogood 19
118 Mama Tried Grateful Dead 19
119 With a Little Help from My Fri… Jim Sturgess & Joe Anderson 19
120 (I Know) I’m Losing You Rare Earth 19
121 Hey Big Brother (Single) Rare Earth 19
122 Roundabout Yes 19
123 L.A. Woman Billy Idol 18
124 Light My Fire The Doors 18
125 A Better Man Keb’ Mo’ 18
126 Gallows Pole Led Zeppelin 18
127 Pamela Brown Leo Kottke 18
128 Colorful Rocco DeLuca & The Burden 18
129 Great White Buffalo Ted Nugent 18
130 Who Are You The Who 18
131 Follow You Down Alejandro Escovedo 17
132 Blue Sky The Allman Brothers Band 17
133 All Right Now Copycats 17
134 The Cold Hard Facts The Del McCoury Band 17
135 L.A. Woman The Doors 17
136 I Know You Rider Joan Baez 17
137 Nobody Told Me John Lennon 17
138 God Trying To Get Your Attention Keb’ Mo’ 17
139 Can’t You See [Live] The Marshall Tucker Band 17
140 Middle of the Road The Pretenders 17
141 Get Ready (radio edit) Rare Earth 17
142 I Am A Man Of Constant Sorrow The Soggy Bottom Boys 17
143 Once In a Lifetime Talking Heads 17
144 Mercury Blues David Lindley & El Rayo-X 16
145 Easy to Slip/I Know You Rider Little Feat 16
146 Pop Song 89 R.E.M. 16
147 Undone (The Sweater Song) Weezer 16
148 The Goddess Deirdre Jenkins 15
149 Bertha Grateful Dead 15
150 Me & My Uncle Grateful Dead 15
151 Mexico James Taylor 15
152 Bye Bye Love David Lindley & El Rayo-X 14
153 Wharf Rat Grateful Dead 14
154 Not Fade Away/Goin’ Down the Road Grateful Dead 14
155 Get Up R.E.M. 14
156 Feelin’ Alright Rare Earth 14
157 Time Has Come Today The Chambers Brothers 13
158 I Feel Love (12″ Version) Donna Summer 13
159 Fortunate Son John Fogerty 13
160 U Got the Look Prince 13
161 It’s the End of the World … R.E.M. 13
162 Miss Me but Let Me Go The Rarely Herd 13
163 And She Was Talking Heads The Best of Talking Heads 13
164 Won’t Get Fooled Again The Who 13
165 Ramblin’ Man The Allman Brothers Band 12
166 School of Rock Karaoke All Stars 12
167 Brandy (You’re A Fine Girl) Looking Glass 12
168 I Am A Man Of Constant Sorrow The Soggy Bottom Boys 12
169 Rock and Roll, Pt. 2 Gary Glitter 11
170 Bad To The Bone [Live] George Thorogood 11
171 You’ve Got a Friend James Taylor 11
172 Vertigo U2 11
173 God Will Take Care of You Aretha Franklin 9
174 Singing in My Soul Sister Rosetta Tharpe 9
175 Climbing Higher Mountains Aretha Franklin 8
176 I Wouldn’t Mind Dying Dorothy Love Coates & … 8
177 I Know You Rider Roger “Hurricane” Wilson 8
178 Didn’t It Rain Sister Rosetta Tharpe 8
179 On Our Way (1-13-1972 Opening… Aretha Franklin 7
180 Climbing Higher Mountains … Aretha Franklin 7
181 My Sweet Lord (1-14-1972 In… Aretha Franklin 7
182 Lord, Don’t Forget About Me Dorothy Love Coates & … 7
183 One Bourbon, One Scotch, One Beer George Thorogood 7
184 Wholy Holy (1-13-1972 Version) Aretha Franklin 6
185 Give Yourself to Jesus Aretha Franklin 6
186 There’s a God Somewhere Dorothy Love Coates & … 6
187 How I Got Over Aretha Franklin 5
188 My Sweet Lord (1-13-1972 Instrumental Version) Aretha Franklin 5
189 Old Landmark Aretha Franklin, James … 5
190 That’s Enough Dorothy Love Coates & … 5
191 You’ll Never Walk Alone Aretha Franklin 4
192 What a Friend We Have In Jesus Aretha Franklin 4
193 Aretha’s Introduction … Aretha Franklin 4
194 Mary, Don’t You Weep Aretha Franklin 4
195 I Won’t Let Go Dorothy Love Coates 4
196 Opening Remarks By Reverend C … Aretha Franklin 3
197 Aretha’s Introduction … 3
198 Medley: Precious Lord, Take My… Aretha Franklin 3
200 Wholy Holy Aretha Franklin 3
201 On Our Way (1-13-1972 Version) Aretha Franklin 2
202 Never Grow Old Aretha Franklin 2
203 Precious Memories Aretha Franklin 2
204 I Drink Alone [Live] George Thorogood 2
205 Shenandoah Harry Belafonte 2
206 I Am A Man Of Constant Sorrow (Instrumental) John Hartford 2
207 Henry Keb’ Mo’ 2
209 Precious Memories Aretha Franklin 1
210 On Our Way (1-14-1972 Opening… Aretha Franklin 1
211 On Our Way (1-14-1972 Version) Aretha Franklin 1
212 What a Friend We Have In Jesus… Aretha Franklin 1
213 She Took Off My Romeos David Lindley & El Rayo-X 1
214 I Was Wrong Keb’ Mo’ 1
215 Highway Blues Marc Seales 1
216 Down by the Riverside Sister Rosetta Tharpe 1