Bits & Bobs Journalling

Interesting Bits: Friday, 09/09/2022

A date that is correct in Yankee and Hiberno-English formats.

Facebook has no idea what data is really has

Bruce Schneier posts:

The hearing amounted to two high-ranking engineers at one of the most powerful and resource-flush engineering outfits in history describing their product as an unknowable machine.

The special master at times seemed in disbelief, as when he questioned the engineers over whether any documentation existed for a particular Facebook subsystem. “Someone must have a diagram that says this is where this data is stored,” he said, according to the transcript. Zarashaw responded: “We have a somewhat strange engineering culture compared to most where we don’t generate a lot of artifacts during the engineering process. Effectively the code is its own design document often.” He quickly added, “For what it’s worth, this is terrifying to me when I first joined as well.”

It sounds very much like there was no care or thought really taken about the data that Facebook uses. It’s just stuff, useful stuff, but not important stuff. So it’s not really taken care of. But it’s turned out that actually some of this ‘stuff’ is dynamite and some is toxic waste. Cleaning it up will be an expensive job, not unlike cleaning a toxic spill, before it gets any worse.

The Unions Are Alright

Mick Lynch is showing up the facile media:

“The bishop of Durham was on a panel with me last week, saying: ‘I identify with the issues, but I don’t think strike action is the answer,’” Lynch says. “But what is the answer? Do we pray, or play tiddlywinks, or have a sponsored silence? What is there for working people to do if they’re not organised?”

Well said.

More Data Abuse

Regulate them, fine them, hit the bottom line, this is all kinds of weird and wrong:

“On July 16, Twitter user @melancholynsex uploaded a thread alleging that one week after purchasing a pregnancy test from Walgreens using their rewards card, they had been sent a package from Enfamil containing baby formula. Besides criticizing the waste of a product which has recently become a scarce commodity, they chastised formula company Enfamil for sending free formula without being sure the recipient needed it, pointing out that if they were “desperately trying to get pregnant,” that package would be “a kick to the face,” and that if an abusive partner had intercepted the package, that could potentially put the recipient in harm’s way. In the wake of Roe v. Wade’s overturn, this kind of marketing can elicit a special sort of worry—what data can companies collect about reproductive choices, and how will they use it?”

Security Theatre: “Not even death can exempt you from TSA screening.”

The Verge has a fascinating long read on the history of the TSA in the US, in short, it sounds like an organisation that’s setup to cause conflict, both within and without, sample:

When President George W. Bush signed the Homeland Security Act in 2002, he declared the job of every law enforcement officer working under the new Department of Homeland Security as “essential” and “unprecedented.” This justified his decision both to give the DHS extraordinary powers to securitize air travel and to exclude its employees from basic federal protections and work rules. 

He preempted any accusations of exploitation with a patriotic scolding. 

“You’re charged with being on the front line of protecting America,” he reminded them. “Make sure you get your job done.”

But what exactly is that job? Empirically, we know that the TSA does little to stop massive terror plots or even the occasional airport shooting. Instead, TSOs protect the flying public in lots of little ways — by stopping cases of human trafficking, for example, or confiscating firearms from people’s carry-on luggage. And that’s good! But it doesn’t justify the massive curtailing of individual liberties inside airports, the regular harassment of ethnic and religious minorities and gender nonconforming people, and the creation of one of the most vindictive and hostile workplaces in the federal government.

As for Bush’s first line about protecting America, I don’t really recognize the America that exists at a TSA checkpoint. It is overly paranoid, vindictive, and unaccountable to us as citizens. In fact, it mostly brings to mind Masha Gessen’s observation that “resignation was the defining condition of Soviet life.” At airport security, I, too, feel a keen sense of despair and helplessness, and I can only pray that the gaze of the administrative state passes over me without notice. 

Not everyone is so lucky.

“I never know when I’ll get someone who really wants to ruin my day, and they have complete authority to do so,” says Victoria Scott. “I have trans friends who just don’t fly because they’re too scared. I can’t blame them.”

Training Machine Learning – What’s in that training data, exactly?

Melissa Heikkilä asks that question and discusses the implications for MIT Review:

Private data is often scattered throughout the data sets used to train LLMs, many of which are scraped off the open internet. The more often those personal bits of information appear in the training data, the more likely the model is to memorize them, and the stronger the association becomes. One way companies such as Google and OpenAI say they try to mitigate this problem is to remove information that appears multiple times in data sets before training their models on them. But that’s hard when your data set consists of gigabytes or terabytes of data and you have to differentiate between text that contains no personal data, such as the US Declaration of Independence, and someone’s private home address.

This is yet another area where data is slung around without much apparent thought as to what it’s going to be used for or where it came from.

This Weeks Image

Taken on Synge St, Dublin.