Querying Graph Neighbors in SPSS

The other day I showed how one could make an edge list in SPSS, which is needed to generate network graphs. Today, I will show how one can use an edge list in long format to identify neighbors for higher degree relationships.

So to start, what do I mean by a neighbor of higher degree relationship? Lets say I have a relationship between two nodes A B. Now lets also say I have another relationship between nodes B C. I might say that A and C don’t have a direct relationship, but they are related in that they both have a relationship to B. So A is a first degree neighbor of B, and A is a second degree neighbor of C. If I drew a graph of the listed network, the degree relationship between A and C would be the minimum number of edges one would have to traverse to get from the A node to the C node.

A  B  C

Why would a criminologist or crime analyst care about relationships of higher degrees? Well, here are two examples I am familiar with in criminology;

For more simple and practical motivation for crime analysts, you may just have some particular individuals who you want to have targeted enforcement towards (known chronic offenders, violent gang members) and you would like to compile a more extended network of individuals related to those particular offenders to keep an eye on, or further investigate for possible ties to co-offending or gang activity.

So to start in SPSS, lets say that we have a edge list in long format, where there is a column that ID’s each person, and another column that shows if those two people are related at the same event. Exampe ties for a crime analyst may be victimizations, or co-offending, or being stopped for field interviews at the same time.

*Long dataset marking people sharing same incident (ID).
data list free / IncID (F2.0) Person (A15).
begin data
1 John 
1 Mary
2 John 
2 Frank
3 John 
3 William
4 John 
4 Andrew
5 Mary 
5 Frank
6 Mary 
6 William
7 Frank 
7 Kelly
8 Andrew 
8 Penny
9 Matt 
9 Andrew
10 Kelly 
10 Andrew
end data.
dataset name long.
dataset activate long.

Now, lets say we want to grab higher degree neighbors for Mary, first I will ID the first degree neighbors by creating a flag, and then aggregating within the incident ID. That is, cases that share an incident with Mary.


*ID Mary and then aggregate to get first degree.
compute degree1 = (Person = "Mary").
*Now aggregate to get all degree1s.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=IncID
  /degree1 = MAX(degree1).

To identify if a person is a second degree neighbor of Mary, I can first aggregate within person, to ID that both John and Frank are first degree neighbors, and then pick their first degree neighbors, who I will then be able to tell are second degree neighbors of Mary.


*Aggregate within edge ID to get second degrees.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=Person
  /degree2 = MAX(degree1).
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=IncID
  /degree2 = MAX(degree2).

I can continue to do the same procedure for third degree neighbors.


*Aggregate within edge ID to get third degrees.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=Person
  /degree3 = MAX(degree2).
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=IncID
  /degree3 = MAX(degree3).

So now this should be clear how I can make a recursive structure to identify neighbors of however many degrees I want. I end the post with a general MACRO to estimate all neighbors of a certain degree given an edge list in long format. Since this will expand to very many cases, you will likely only want to use a smaller list, or I provided an option in the macro to only check certain flagged individuals for neighbors.

I’d love to see or hear about other applications crime analysts are using such social networks for. On the academic bucket list to learn more about graph layout algorithms, so hopefully you see more posts about that from me in the future.


*Current requirement - personid needs to be a string variable.
*Flag argument will return people who have a value of one for that variable and all of there
neighbors in the long list.
DEFINE !neighbor (incid = !TOKENS(1)
                           /personid = !TOKENS(1)
                           /number = !TOKENS(1) 
                           /flag = !DEFAULT ("") !TOKENS(1)   )

dataset copy neighbor.
dataset activate neighbor.
match files file = *
/keep = !incid !personid !flag.

rename variables (!incid = IncID)
(!personid = Person).

*I need to make a stacked dataset for all cases.
compute XXconstXX = 1.

*Making wide dataset of Persons in the long list.
dataset copy XXwideXX.
dataset activate XXwideXX.

*eliminating duplicate people.
sort cases by Person.
match files file = *
/first = XXkeepXX
/by Person
/drop IncID.
select if XXkeepXX = 1.

*reshaping long to wide - could use flip here but that requires numeric PersonIDs.
*flip variables = Person.
!IF (!flag  !NULL) !THEN
select if !flag = 1.
!IFEND
casestovars
/ID = XXconstXX
/seperator = ""
/drop XXkeepXX !flag.
*Similar here you could just replace with a list of all unique offender nodes - just needs to be in wide format.

*Match back to the original long dataset.
dataset activate neighbor.
match files file = *
/table = 'XXwideXX'
/by XXconstXX.
dataset close XXwideXX.

*Reshape wide to long - @ is for filler so I dont need to know how many people - it gets dropped by default in varstocases.
string @ (A1).
varstocases
/make DegreePers from Person1 to @
/drop XXconstXX !flag.

sort cases by DegreePers IncID Person.

*Make first degree.
compute degree1 = (Person = DegreePers).
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=IncID DegreePers
  /degree1 = MAX(degree1).
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=Person DegreePers
  /degree1 = MAX(degree1).
*dropping self checks.
select if Person  DegreePers.

!LET !past = "degree1"
!DO !i = 2 !TO !number
!LET !current = !CONCAT("degree",!i)
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=IncID DegreePers
  /!current = MAX(!past).
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE = YES
  /BREAK=Person DegreePers
  /!current = MAX(!current).
!LET !past = !current
!DOEND
*Clean up and delete duplicates.
compute degree = (!number + 1) - SUM(degree1 to !current).
string P1 P2 (A100).
DO IF Person <= DegreePers.
    compute P1 = Person.
    compute P2 = DegreePers.
ELSE.
    compute P1 = DegreePers.
    compute P2 = Person.
END IF.
sort cases by P1 P2.
match files file = *
/first = XXkeepXX
/by P1 P2
/drop DegreePers Person.
*will be [1 + degrees searched] if not a neighbor.
select if XXkeepXX = 1 and degree <= !number.
match files file = *
/drop degree1 to !current XXkeepXX IncID.
formats degree (!CONCAT("F",!LENGTH(!number),".0")).
!ENDDEFINE.

*Example use case - uncomment to check it out.
*dataset close ALL.
*Long dataset marking people sharing same incident (ID).
*data list free / IncID (F2.0) Person (A15).
*begin data
1 John 
1 Mary
2 John 
2 Frank
3 John 
3 William
4 John 
4 Andrew
5 Mary 
5 Frank
6 Mary 
6 William
7 Frank 
7 Kelly
8 Andrew 
8 Penny
9 Matt 
9 Andrew
10 Kelly 
10 Andrew
*end data.
*dataset name long.
*dataset activate long.
*compute myFlag = 1.
*set mprint on.
*output close ALL.
*neighbor incid = IncID personid = Person number = 3.
*set mprint off.
*dataset activate long.
*dataset close neighbor.
*compute myFlag = (Person = "Mary" or Person = "Andrew").
*set mprint on.
*output close ALL.
*neighbor incid = IncID personid = Person number = 3 flag = myFlag.
*set mprint off.
Advertisements
Leave a comment

2 Comments

  1. Jon Peck

     /  July 19, 2013

    Several years ago I consulted with the national police force of a country in Europe. They did all their crime analysis with SPSS (it was just SPSS back then). One of their reporting systems had to compute the transitive closure of a set of known criminals, and they had developed a nice system for doing that in Statistics.

    If I were doing this now, however, I would do this much more simply and much faster with a modest set of Python code.

    I should note, also, that in cases where there is uncertainty on whether two occurrences are actually the same person, the Entity Analytics tools now available in Modeler might be a wise thing to consider before launching into this algorithm.

    Reply
    • I would more seriously dig into the NetworkX python library or the igraph R library if I wanted to do anything more complicated. For the same person problem I’ve used this software, http://fril.sourceforge.net/. I’ve attempted to script up record linkage solutions a couple times in base SPSS and have failed (haven’t used your FUZZY script yet).

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: