Machi­ne learning and AI sys­tems are one of the weakest parts in the secu­ri­ty chain and the­re­fo­re can be com­pro­mi­sed.
The misu­se of machi­ne learning and AI as a wea­pon to infil­tra­te other machi­ne learning and AI sys­tems is cal­led Data poi­so­n­ing.

Such poi­so­n­ing attacks:
1. com­pro­mi­ses data collec­tion
2. the atta­cker sub­verts the learning pro­cess for the AI or machi­ne learning sys­tem
3. degra­des or mani­pu­la­tes the per­for­mance of the sys­tem

Pos­si­ble attack sce­n­a­ri­os inclu­de:
– App­li­ca­ti­ons that rely on untrusted data­sets:
1. Crowd­sour­cing to label data
2. Data collec­ted from untrusted sources (peop­le, sen­sors, etc.)
– Data cura­ti­on is not always pos­si­ble.

A popu­lar examp­le of data poi­so­n­ing:
Micro­soft Tay:
Tay was an arti­fi­ci­al intel­li­gence chat­ter­bot that was ori­gi­nal­ly released by Micro­soft Cor­po­ra­ti­on via Twit­ter on March 23, 2016; it cau­sed sub­se­quent con­tro­ver­sy when the bot began to post inflamma­to­ry and offen­si­ve tweets through its Twit­ter account, for­cing Micro­soft to shut down the ser­vice only 16 hours after its launch.[1] Accord­ing to Micro­soft, this was cau­sed by trolls who "atta­cked" the ser­vice as the bot made replies based on its inter­ac­tions with peop­le on Twit­ter. – Tay(Bot), Wiki­pe­dia.

Ways to defend against data poi­so­n­ing:
1. Fil­ter and pre-pro­cess the data:
a. Tech­ni­ques:
– Out­lier detec­tion.
– Label sanitiza­ti­on tech­ni­ques.
b. May requi­re some human super­vi­si­on:
– Cura­ti­on of small frac­tions of the data­set.
c. Coor­di­na­ted or ste­alt­hy attacks can­not be detec­ted in most cases.

2. Rejec­ting data that can have a nega­ti­ve impact on the sys­tem
a.Techniques:
– Cross-vali­da­ti­on.
– Rejec­tion in online learning sys­tems.
b. May requi­re some human super­vi­si­on:
– Cura­ti­on of small frac­tions of the data­set.
c. In some cases can be com­pu­ta­tio­nal­ly expen­si­ve or dif­fi­cult to app­ly.

3. Other tricks…
a. Increa­se the sta­bi­li­ty of your sys­tem:
– Lar­ger data­sets
– Sta­ble learning algo­rithms
– Machi­ne ensem­bles
b. Esta­blish mecha­nisms to mea­su­re trust during the data collec­tion (e.g. users).
c. Design AI/ML algo­rithms with secu­ri­ty in mind: use sys­te­ma­tic attacks to test robust­ness.

Source: Dr. Luis Muñoz-Gon­zá­lez, Impe­ri­al Col­le­ge Lon­don