Archive for Technical

Dragged kicking and screaming into the century of the fruitbat

With apologies to Terry Prachett, I feel like I being dragged (dragging myself?) into the Century of the Fruitbat.  As I mentioned a long time ago, I don’t care for Facebook.  I prefer my blog.  That said, many of my friends and most of my family use Facebook.  So I’m going to start a bit of an experiment.  I’ve downloaded and installed the Wordbook plugin for my blog software (wordpress).   Starting with this post, in theory anything that’s published on the blog gets cross-posted to fb.

Kill me now.

Comments (3)

So, this is important

I’m not a big baseball fan.  For that matter, there are few ball sports that interest me.  But, this is important.  If you recall, a few years ago (2004), there was a big furor over steroids in baseball.  The government searched BALCO and found evidence of rampant steroid use by baseball players.  Now I hadn’t been paying attention to this, but there has been an ongoing legal dispute over that search and how it was conducted.

Yesterday, the 9th Circuit Court of Appeals issued a 9-2 decision that restores a great portion of the 4th Amendment’s right to protection against unreasonable search and seizure in an electronic context.

Caveat lector, I am not a lawyer and I’ve never played one on TV.  Moreover, I haven’t finished reading the dissenting opinions and I’m almost certainly missing some of the nuances here.  In a nutshell, the government had evidence, sufficient to obtain a warrant, against 10 players.  Based on this evidence and the warrant, the prosecutors were able to search BALCO for information about those 10 players.  BALCO maintains all records on their computers, of course.

Now, I’ve had experience with these types of searches.  The government never takes what’s just in their warrant.  The defined search *process* always allows them to take the whole computer or the whole hard drive, or more often than not, an image of the whole hard drive.  The reasoning is that information pertaining to the search could be hidden, or their could be some form of booby trap or the data could be encrypted or …

So, the prosecutor in the steroids case took the whole directory in which there was a file containing drug tests of MLB players.  The file itself contained information about far more than the 10 players named in the warrant.  So, rather than taking the 10 rows of the spreadsheet, rather than taking just the one file, the prosecutor took a directory containing the results of thousands of drug tests.

The prosecutor then (as I understand it) went jurisdiction shopping until he found a judge willing to grant a new warrant for information about 104 players, based on the information found in the spreadsheet.  The argument being that once they had access to the spreadsheet, or the directory, or even the computer, the additional information was in plain sight.  Several judges believed that the prosecutor intentionally wrote the process for executing the search warrant in such a way that he could *expand* the scope of the investigation by introducing evidence based on this plain sight doctrine in order to find new players to prosecute.

What’s interesting is that this seems fairly normal to many of us.  Of course the prosecutor will search your whole hard drive, of course they will bring new charges, etc.  The problem is that a) BALCO itself was not the subject of the prosecution, and b) this IS NOT the way things work in the tangible world.  Prosecutors are exploiting the new(ish) electronic domain to gain access to information they wouldn’t have if files were stored on paper.

Apparently (I need to look into this), the relevant doctrine in the physical world is the United States vs Tamura, 1982.  In this case, the object of a search was stored in a file cabinet.  It was not feasible to search that file cabinet in the office, so the prosecutors obtained access to it, with the requirement that they only pull information relevant to their warrant – even if they stumbled across additional criminal information.

The majority in the 9th Circuit decision believe that a sensible application of Tamura to an electronic domain means that information/documents stored in proximity to the information sought in the warrant is *not* in plain view.  And they are correct.  If information in adjacent files in a file cabinet are not in plain view, then neither is information stored electronically in adjacent files, folders or computers.

Explicitly, the justices stated:

In general, we adopt Tamura’s solution to the problem of necessary over-seizing of evidence: When the government wishes to obtain a warrant to examine a computer hard drive or electronic storage medium in searching for certain incriminating files, or when a search for evidence could result in the seizure of a computer, see, e.g., United States v. Giberson, 527 F.3d 882 (9th Cir. 2008), magistrate judges must be vigilant in observing the guidance we have set out throughout our opinion, which can be summed up as follows:

1. Magistrates should insist that the government waive reliance upon the plain view doctrine in digital evidence cases. See p. 11876 supra.

2. Segregation and redaction must be either done by specialized personnel or an independent third party. See pp. 11880-81 supra. If the segregation is to be done by government computer personnel, it must agree in the warrant application that the computer personnel will not disclose to the investigators any information other than that which is the target of the warrant.

3. Warrants and subpoenas must disclose the actual risks of destruction of information as well as prior efforts to seize that information in other judicial fora. See pp. 11877-78, 11886-87 supra.

4. The government’s search protocol must be designed to uncover only the information for which it has probable cause, and only that information may be examined by the case agents. See pp. 11878, 11880-81 supra.

5. The government must destroy or, if the recipient may lawfully possess it, return non-responsive data, keeping the issuing magistrate informed about when it has done so and what it has kept. See p. 11881-82 supra.

As someone who has participated in prosecutorial searches, these strike me as eminently sensible guidelines.  The first states that there’s no such thing as plain view in computer cases – each piece of information is in its own separate space.  To consider otherwise is to allow every piece of electronic equipment in the world to be searched since they are all connected via the Internet.  The second states that the prosecutor shouldn’t be the one doing the search, b/c the searching personnel *will* wind up seeing information that isn’t related to the warrant.  The problem is that since nothing is in plain view (can you tell what does a hard drive contain by looking at the physical device?), an in-depth search is required to fulfill the warrant, but that search will violate the terms of the warrant if all of the information is shared with the prosecutor.  The third states that prosecutors can’t *overestimate* the risk of booby traps, deadfalls, etc. that would destroy data.  There was no reason to think there were such in the BALCO computers and therefore, a full copy of their hard drives was not required.  The fourth is pretty plain – the process/protocol must be restricted to what the government is allowed to find.  And the fifth says that the prosecutor can’t keep things that it found that it wasn’t supposed to have.

All in all, a very reasonable balance of 4th Amendment rights in a digital context – no matter what Orin Kerr might say. Good news on the electronic privacy front… for once.

Comments off

DDoS-ing good policy

In computer security, one of the most difficult and annoying problems is the distributed denial of service attack (DDoS).  The idea behind a DDoS attack is straight forward: the attacker tries to prevent legitimate use of the service by using a large number of other computers.  Usually these other computers have been compromised (hacked) and are following the commands of the attacker.  Such computers are usually called “zombies.”

There are a number of ways to conduct a DDoS attack, but they are typically variations on the following theme.  The attacker instructs the zombies to request access to the service.  But the zombies have no intention of actually using the service, instead, they often forge network traffic so that it’s impossible to tell who is making the request.  Because the zombies don’t want to use the service, they can make thousands of requests without slowing down.  The poor computer hosting the service then sees tens of thousands of requests for access, tries to fulfill the requests and eventually becomes overloaded and dies.  The zombies win.

What makes the DDoS attack so difficult to defend against is that each and every request coming in, looks like a legitimate request.  The problems are: a) the core of the request is a lie (at the direction of the attacker, the zombie has forged the network traffic), and b) the sheer quantity of bogus requests – one or two could be handled easily, 10s of thousands not so much.

Unfortunately, we’re seeing the exact same thing when it comes to creating good policies in the U.S.: a distributed denial of service attack.

The creation of good policies requires discussion.  Ideally, arguments will be presented, the merits debated and evaluated with respect to a set of shared norms, and these discussions will shape the eventually enacted policy.  But on every important issue, this is not occurring.  Instead, we have a group of reactionaries (they’ll call themselves conservatives) who try to prevent the important discussions from ever occurring. Take two issues, global warming and health insurance.

On global warming, we could have a fairly important discussion about the expected costs of global warming, the probabilities of certain events occurring, the expected costs of limiting CO2 in order to limit the effects.  We could discuss the moral issues involved, from the increased rates of disease due to higher temperatures, the possibility of spending more money now on certain social problems, and the moral worth of species that will go extinct because of a changing climate.  There are even scientific questions that remain unresolved.  But instead of having any of those discussions, conservatives persist in lying.  Those lies are then redistributed on Fox News and in conservative publications.  The purpose of the lies isn’t to have a real discussion with respect to a valid scientific point.  The purpose is to attack the very idea that there can be a discussion.  The purpose is to make people believe that instead of global warming being a policy issue, it’s a political one.

A year ago, I was at a family reunion and sat down with my father and uncle who hold advanced degrees in physical sciences (masters and phd respectively).  The topic came around to global warming – perhaps one of them made a derisive comment about it, I don’t recall.  The next thing I knew, these two very intelligent men turned into DDoS zombies.  They brought up a number of talking points that they had heard, but hadn’t actually verified:

  • “Ice cores have shown that temperature rises before CO2 levels.” Historically true, but completely irrelevant.  We know of the causal reason that an increase in CO2 will increase temperature.  A doubling of CO2 will raise the temperature by roughly 3 degrees Celsius.  However, no one has said that the only reason that the temperature can rise is due to CO2 – there are certainly other reasons.  Why temperature rose in those cases is a legitimate scientific question, but rather than discussing that issue, the right uses a misinterpretation of the idea to attack the possibility of global warming.
  • “CO2 only contributes 3% of the effects of greenhouse gases.” Alternatively, you’ll hear that water vapor is 97% or 98% of the total effect.  Nope.  This is a pure, flat out lie.  I spent a few hours trying to track down the source.  It turns out that it’s not a scientific result.  3% never appeared in a peer-reviewed paper.  Instead, someone reviewing one of the IPCC reports decided that the report said 3% (it didn’t) and ever since, right-wing news has thrown around that number to dispute the very possibility that rising levels of CO2 could contribute to global warming.

There were a few other talking points they had and there are dozens more to be found online.  My favorites often come from a site called Watt’s Up With That.  Favorites because they completely demonstrate that people are *actively* constructing lies to deceive the public on global warming.  You read a post there and you go to the original sources that they cite and sure enough, they’ve either taken it out of context or they’ll take the worse of all possible predictions.  My favorite is when the push what amount to linear rather than the actual (exponential) projections of climate change and then argue that because the actual temperatures don’t fall into their bogus projections, climate change is false.

The point is that none of those talking points are serious attempts to debate the science.  They are merely an attempt to overwhelm the dialog with incorrect information in order to delay or kill good policy.  Hell, they aren’t even arguments, at best they are arglets. Fragments of an argument with no real merit.

The arglets against health care reform are even worse.  A handful of people literally make things up and rather than having a discussion about the very real ways our health care system is falling apart, the news media (Fox and others) goes off on these tangents for days.  Consider:

  • “death panels” What a load of crap.  There’s no such thing in the health care bill.  Which is of course, not to say that these things don’t exist.  Every insurance company has a death panel.  Or more accurately, insurance companies consider the amount of rescission activity when evaluating employees, i.e., you’ve paid your premiums for years and when you try to use the policy and the company drops your coverage.
  • “in <scary socialist country of your choice> people have to wait <some large number> weeks for <some medical procedure>.” We hear that one a lot.  Usually, the country is England or Canada, the time is 6+ weeks and it’s a hip replacement.  Of course, this arglet is also untrue, but is interesting in being untrue on multiple levels.  First of course, is the basic lie – delays for surgery. A small nugget of truth – this was a small problem pre-2000, before the British started increasing the amount of money for the NHS.  Then the larger lie – the implication that it’s better here in the U.S. under your insurance.  Then finally, the mother of all lies – that anyone’s even proposing a single payer system like the NHS anyway.  “Oh my god, some other system that no one here is seriously considering has wait times that are as bad as some of ours with insurance, but not nearly as bad as if you have no insurance and have to wait until you’re on medicare to obtain the surgery.”  To borrow a line from a glibertarian idiot – give me a break.
  • Perhaps my favorite recent arglet: “Stephen Hawking never would have survived to be a brilliant physicist under the British system.” Given that he is a British citizen and has always received his health care via the NHS, this is completely crazy, literally divorced from reality, batshit insane.

I could go on and on.  For any topic you can name, there are people promoting lies in order to prevent good policies from being enacted.

Now here’s the part where I tell you the good news based on my DDoS analogy.  Tough – there isn’t any.  There are a few approaches to dealing with a computer DDoS:

  1. Ignore it.  Build capacity so that all requests, legitimate and bogus can be serviced.  This is unlikely to work.  The media has a short attention span, hell they’ve got ADHD.  While the majority of arglets are debunked within minutes of their creation, they continue to live on in the right-wing zombies and the media is incapable of ignoring that.
  2. Identify the source of the arglets and take ’em out.  In computer terms, this often means tracking down the source of the DDoS commands and arresting them.  For dialog, this means identifying the source of the arglets and ignoring them and their zombies.  But then we’re back to solution 1 and the media’s inability to call bullshit.
  3. Ensure that all potential zombie computers are patched, i.e, ensure that potential zombies are innoculated/education against the lies.  Unfortunately, this doesn’t work in a computer context – too many lazy people with computers that they don’t want to take care of.  And it’s unlikely to work in a political context – too many lazy people who can’t be bothered to conduct basic fact check (or even sanity checking) before propagating a lie.

In short, there’s no way for the current political process to work properly while the right wing and various corporate interests are conducting a denial of service attack.  Unfortunately, the only real solution is to circumvent the dialog and pass good legislation regardless of what’s in the press.  For 16+ years, Bill Kristol has advised the right to prevent such a thing.  “Don’t allow good legislation on health care.”  People would like good legislation and would realize that the republicans were a bunch of lying con men who wanted to shovel government money (aka public funds,  aka your money and mine) to corporate interests.  The republicans have gotten good at this and now the only way to pass decent legislation is to ignore them, which is easier and easier given that they’ve flat out stated that they won’t vote for their own compromises.  Screw ’em.  Health care is too important.  Pass it, pass it now.  If you won’t support a single payer option, then at least give people the choice of a public option that’ll be better, cheaper and more efficient than what we’ve got now.

Comments off

the importance of verifying backups

I was using my personal laptop at a meeting yesterday and grabbed about a gig of files from someone’s usb key.  While I was taking minutes, I noticed a lovely new icon that popped up… your hard disk drive is failing.  Eeek!  Not cool.  So, last night I got home and started backing up my files.  A few of the new files and a couple unimportant old files didn’t transfer properly.   Fine, I could live without them.

Today, I bought a replacement drive.  When I got home, I wanted to record the diffs from last night to today (quite a few since I uncompressed a lot of that gig’s worth of files).  As I started that up, I received a lot of notices about the *backup* disk failing.  I tried fixing it, but no luck.  Found a new backup disk.  Re-backed up 30 gigabytes worth of data.  Verified that and installed the new drive.  Things worked pretty well.  I installed Fedora 11 for the second time in two weeks, restored all my personal files and now I’m updating the system.

I’m just glad I verified the backup prior to doing this – it would have been annoying

Comments off

Vint Cerf called . . .

… and he and Tim Berners-Lee want you to stop breaking the Internet.

Over the past couple of weeks I’ve had several occasions to be invited into someone’s walled garden on the internet.  You know the places.  Lovely little sites that are entirely self-contained and which you can’t access unless you are a member?  In the old days, Compuserve and AOL were the big walled gardens.  These days, it’s Facebook and Linked-In.

These two social networking sites provide easy access to various tools for maintaining a web presence, but also keep enough meta-data that it’s easy to track down other people that you are likely to know.  FWIW, I have no problem with the meta-data aspects of the site.  If you want to find people that you may know due to past associations, well more power to you.  The concern I have is that once you go beyond those functions and start using the internals of the garden to maintain information, well, then the only people who can see that information are those who belong to that garden.  Even that is fine with me if the information is private and should be restricted through some form of identity management and authorization; but what if you intend the information to be public, should everyone have to come into the garden to see what should be public?  What does restricting the information to only those with (Facebook) accounts do for you?  You aren’t controlling access, you are just making the owners of your walled garden a little richer by increasing the popularity of their sites.

I mentioned that I’ve had several invitations respecting walled gardens recently.  FWIW, two were on Facebook.  A few of these are related to my upcoming 20th high school reunion (er, actually, that should be 20 year – technically it’ll be our 1st reunion and at this rate, our 20th will be in the year 2389) and seeing someone’s pictures, or viewing our class group or….  The other was a friend who stopped blogging publically (for the most part) and is now (as I understand it) “writing on her wall” (which is a wonderfully ironic image for this post).  The Linked-in request was to “recommend” someone professionally.  Okay, it’s true that I do belong to Linked-in, but I use it as an online rolodex, not as a way to keep in touch with what former colleauges are doing.  I don’t “recommend” people.  I don’t ask to be “recommended” and I don’t really keep up with what happens.  As far as Facebook?  Not participating and not joining.  Use the (free) tools that are available like flickr for images and blogger or wordpress for blogging.  In the meantime, if Facebook and/or Linked-in ever open up to the rest of the Internet, then maybe I’ll look at your images and read your writing.  But if they don’t, then you’re restricting yourself to only a subset of the people on the Internet.





oh, and You Kids Get Off My Lawn!

Comments (5)

blog your type?

This is neat.  Typealyzer claims to examine a blog (or presumably any webpage) in order to identify the Myers-Briggs type of the author.  It correctly identifies me as an INTP, but doesn’t seem to get etselec.

Comments (2)

at home in the (technical) universe

Some recent (somewhat) technical notes:

  • A while back, I swapped the dead harddrive from my ipod with a compact flash card.  Unfortunately, at the time, the biggest (affordable) compact flash was 16 GB, so I lost about half the capacity from my ipod.  Not a huge problem, but it became more of one as I added more music.  Yesterday, a shiny new 32 GB compact flash arrived and now I’m back to the nominal amount of space on my ipod, except that it’s all solid state and cool.  From the technical standpoint, this was something of a PITA, since I didn’t have a windows or mac machine around to reinstall the firmware.  My ultimate solution:  1) back up /dev/sdb (boot record and partition table) and /dev/sdb1 (firmware) from the ipod using dd; 2) put the CF in my laptop and format it (a camera would work just as well), this just normalizes the card; 3) put the CF in the ipod (or in the laptop); 4) write the patition table using dd; 5) edit the partition table using fdisk, set the size of sdb2 to be 32 rather than 16 GB; 6) write out the firmware to sdb1; 7) format sdb2 using mkfs.vfat.  Voila – a 32 GB ipod CF
  • If you haven’t seen it already, check out  They’ve got a bunch of mathematically oriented programming problems online of varying difficulty.  Good solutions should all run in 1 minute or less and generally take 100 lines of code or so.  It’s a good way to get familiar with a new programming language and to exercise your brain.  So far, I’ve done the first 70 or so problems – they don’t take too long, maybe a half hour each on average.
  • Finally, I got the clutch in my car replaced yesterday.  The mechanic said that it was in pretty bad shape and that the (plastic?) bearing the clutch uses had worn completely away.  This probably explains why I’ve had no acceleration for the past year (or more?).  I had forgotten what it was like to drive a decent car 🙂

Comments off

Computer maps

A few years ago, K was taking some GIS (geographic information systems) classes.  That was a lot of fun for me since GIS is something I’ve poked at on and off for quite some time.  Back when I was first playing with GIS, GRASS was probably the best (and may still be) open-source GIS system out there, but it wasn’t too user friendly.  So it was a lot of fun playing with ESRI’s ArcGIS.

But 99% of the time, the things that I would like to do with maps don’t require a full blown GIS system.  The Python toolkit Matplotlib includes a Basemap package and that’s getting closer.  Basemap can read GIS shapefiles, handle coordinate transformations, etc.  But even that’s sometimes too much.  What if I wanted a simple, dynamically computed heat map of the location of website visitors?  Or for the PWC database – the counties from which we receive animals?

Well, the Wikimedia Commons has a map of the U.S. where states are slightly separated to allow for easier coloring.  But that’s again difficult to deal with programatically.  So what I’ve done is to create an indexed PNG image where each state is a different index color.  To color the map, you just load it up and change each state’s color triplet to the appropriate value.

I’m not certain if that’s useful to anyone else, but at least I’ve got it documented here for when I need it.

The associated index of colors to states is here: state-colors

At some point in the future, I might do something similar with a NC county map and maybe a world country map.

Comments off

Annotate Flickr

A while back, Luis Villa asked about a script to add creative commons licensing information to an image.  I just wrapped up a first cut at a GreaseMonkey script to do exactly that.  Hopefully someone will find it useful.

Comments off

E-book blogging

I’ve had the Sony Reader now for about a week.  In that time, I’ve taken it on a plane trip, read three full books, multiple days worth of the NY Times and I’m in the middle of two books right now.  Observations so far:

  • The electronic paper is very readable.  On my plane trip, I must have read for several hours straight with no more eye strain than if I had been reading a paper book.  The legibility is good regardless of font size.  You might still want to increase the font size if your eyes are tired, but otherwise, there is no need.
  • The menus and button layouts are pretty reasonable.  You can page forward or back.  There’s a up-down-left-right cursor that is used to move around on a page.  Using the number buttons on the right, you can jump to an arbitrary page in the book.  These buttons double as a quick jump to a menu item on the Reader’s standard menus.  One gripe, you can only move to different links using up/down on the cursor, left/right don’t do anything.  At GB, paideka mentioned that it would be interesting to see what Apple did with the layout and look and feel of a reader.  Agreed
  • Battery life appears to be as advertised: 7,500 page turns per charge.  Keep in mind that a page on the reader contains only about half the content of a standard paper back (depending on page layout and font size).  Still, around 3,500 pages of paper back text is still pretty good.
  • Updating the screen is slow.  It takes about .5 – .75 seconds to update the screen.  A few ramifications:  1) this is almost un-noticeable while reading text; and 2) using the cursor keys is painful, you deal with the update time for each cursor pressed – where ever possible I use the numeric shortcuts.
  • A third ramification of the slow update time is that the Reader, and almost certainly any other reader using this generation of e-paper, is unusable as a reference book.  When I use a reference book, I flip around quite a bit.  Forward to the index, back to the text, forward many pages to the next topic, etc.  I suppose if the reference book had a really good index, it might be better, but for the most part, this is still not a good tool for referencing which is a real shame.
  • The bookmarking system is good.  Each book keeps your place in the book.  The top level of the reader keeps up with the last book you’ve read and your place in that book.  You can set any number of bookmarks in each book and then access the bookmarks on a global or a per book basis.  It would be nice if the reader also kept a list of most recently read, rather than just the single most recently read book; but that’s a small issue.  Typically, I’ll just set a bookmark when I pause in reading, then delete it when I pick the book back up.
  • PDF conversion still leaves something to be desired.  I’ve looked into this a bit.  The converter I’m using converts PDF -> HTML -> LRF.  The PDF -> HTML conversion uses pdftohtml (surprised?) which is good in some ways, but still leaves off certain things (like images!), at least as used by the reader’s converter.  Part of this is due to conceptual differences between PDF and HTML.  HTML marks up text, flagging paragraphs, noting images, etc.  Ideally, all of this is passed to the browser which handles the layout.  PDF will have none of that.  PDF consists of a set of primitives that indicate what text (in which font and size) should go in which location on the page.  There is no markup of paragraphs, instead, each line of text is described individually.  There is no easy way to reconstruct paragraphs from a PDF file (as a research note, I wonder if you could use a partially observable markov decision process?). That said, minus the missing images, the LRF result is definitely readable.

So overall, I’m pretty happy with the reader.  The biggest issue is the refresh time on the electronic paper and I hope that will improve over the next couple of years.

p.s. If you’re curious, so far I’ve read: Free for All (a history of open source),  The Authoritarians (a sociologist’s take on a personality type and how it affects politics) and 20,000 Leagues Under the Sea (which I haven’t read in over 20 years).  I’m currently reading Nietzsche’s The Anti-Christ and Bruce Sterling’s Hacker Crackdown. 

Comments off

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »