Machine learning

Bias in machine learning:
Fairness

Machines do not possess awarenes and consequently they cannot self regulate or make ethical decisions. Machine learning models that constitute intelligence in numerous applications are ubiquitous. Let's look at the example below. The colors in the figure represent the percentages of women in the occupations. If the percentages are coming from the training data, in supervised settings, class ratios are causing the percentages. In this case, the classes would be various occupations. Balancing classes would accordingly lead to each occupation having similar percentages of women. Essentially, raw occurrence statistics of occupations in the training data formulate the decisions. If decisions seem biased according to an application's purpose, an analyst could investigate if the classes are balanced in the training step. If they are balanced, intuitively the algorithm is making biased decisions. The analyst could tweak the algorithm to tune the bias effect.

There are no obvious classes in unsupervised settings. In this case, the analyst cannot analyze the patterns a priori. The machine learning model generated with unsupervised learning determine the outputs. Analyzing artificial intelligence at the construct level would reveal the patterns in the training data, which are also reflected to the outputs. In this scenario, I discovered that machines possess the same social biases that humans do. The machine learning model, namely, the semantic space is trained on language corpora. This semantic space contains facts and statistics about the world, which are embedded in the numeric representations of words. Our novel methods and tailored experimental settings revealed and verified all the social biases inherent in our society such as racial, gender, and health biases. For more details, read Semantics derived automatically from language corpora contain human-like biases. Occupations

Privacy & Security

Programmer De-anonymization:
Source code
Authorship verification - Copyright disputes
Plagiarism detection - Obfuscated code
Software engineering insights

De-anonymization

Binaries
Disassembly - Decompilation
Ultimately malware author identification

Workflow

Underground Forum Member De-anonymization:
Doppelgänger finder
Abusive multiple accounts

Doppelgangers

Social networks:
Collective privacy behavior

Privacy behavior