Doing the right thing first

Arti­fi­ci­al Intel­li­gence (AI) feeds on data. Par­ti­cu­lar­ly, neural net­works and deep lear­ning, such as Ten­sor­Flow, have a vor­a­cious data appe­ti­te. Yet, despi­te its impor­t­an­ce, data often remains an afterthought. Typi­cal­ly, plan­ning for a new data ana­ly­tics pro­ject is occu­pied with deba­tes about the right skill set of data sci­en­tists, the right tools, dead­lines and, of cour­se, bud­get. As a result, most of the time of a data ana­ly­tics pro­ject (mea­su­rements ran­ge from 50% to < 80%) is con­su­med with data search, collec­tion, and refi­ne­ment. A key solu­ti­on to saving time and money is to spe­ci­fy data needs upfront and crea­te data pools accord­in­gly.

Crea­ting data pools

On their own very few com­pa­nies will be able to collect the mas­si­ve amounts of data that hel­ped data ana­ly­tics pioneers like Ama­zon, Face­book and Goog­le crea­te suc­cess sto­ries. One trick to level the play­ing field is teaming up with others to pool data. Data can be poo­led: (a) ver­ti­cal­ly along the suc­ces­si­ve sta­ges of a sup­ply chain (for examp­le, to pre­dict a shipment’s esti­ma­ted time of arri­val) (b) hori­zont­al­ly, for one machi­ne make and model across all users (for examp­le, to pre­dict outa­ges and impro­ve upti­me) © by stacking it "on top of each other” to crea­te “data sand­wi­ches.” One examp­le is laye­ring street maps with data on vehi­cle traf­fic, peop­le traf­fic, wea­ther con­di­ti­ons and event infor­ma­ti­on to pre­dict traf­fic flows.

Text by: Prof. Dr. Chris Schlue­ter Lang­don, Deut­sche Tele­kom

More blog arti­cles from Pro­fes­sor Chris­toph Schlue­ter Lang­don: Click here