Home-AHref.com: The Leading Home a Href Site on the Net home-ahref.com has been connecting our visitors with providers of Commercial Real Estate, Home Insurance, Industrial Real Estate and many other related services for nearly 10 years. Join thousands of satisfied visitors who found Mortgage Brokers, Mortgages, Property Management, Real Estate Advice, and Real Estate Agents.<br/>

Data Dump 3

Important Tools for Visualising and Communicating Data

This list of resources represent an ongoing and growing series of blog posts presenting the most inspiring collection of important, effective, useful and practical data visualisation tools. You can also view these resources via a publicly accessible Google Spreadsheet.
** This series of posts will be undergoing a thorough update during January and February 2013! **
Part 1: Tools for Analysis, Graphing and Enterprise
Part 2: Visual Programming Languages and Environments
Part 3: Google’s Charting and Visualisation Tools
Part 4: Tools for Mapping
Part 5: Specialist Tools and Visualisation Communities
Part 6: Visualisation Presentation and Publishing Tools

The Most Influential Data Visualisation Books

Part 7: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (1)
Part 8: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (2)
Part 9: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (3)

http://code.google.com/p/google-refine/,

http://www.google.com/fusiontables/Home, http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/json/, http://www.google.com/fusiontables/public/tour/index.html, http://www.google.com/support/fusiontables/bin/answer.py?hl=en&answer=184641, http://www.computerworld.com/s/article/9196283/H_1B_visa_data_Visual_and_interactive_tools, https://sites.google.com/site/fusiontablestalks/stories?ft_source=tour_defaulttab&__utma=1.1922599578.1299797723.1299797723.1299797723.1&__utmb=173272373.2.10.1299797450&__utmc=173272373&__utmx=-&__utmz=1.1299797723.1.1.utmcsr=%28direct%29|utmccn=%28direct%29|utmcmd=%28none%29&__utmv=-&__utmk=27954108

http://www.impure.com/ (http://www.youtube.com/watch?v=4oc47BB374U),  http://www.youtube.com/watch?v=XYVAPfb8k5U, http://www.guardian.co.uk/news/datablog/interactive/2011/mar/08/pay-gap-gender-women-men,

http://vis.stanford.edu/wrangler/,

http://www.tableausoftware.com/public/blog/2011/04/data-shaping, http://www.tableausoftware.com/public/training, http://www.computerworld.com/s/article/9210078/Tech_unemployment_higher_than_white_collar_average#interactive_graph,

http://code.google.com/p/google-refine/wiki/Screencasts,

http://www-958.ibm.com/software/data/cognos/manyeyes/, http://www-958.ibm.com/software/data/cognos/manyeyes/page/Visualization_Options.html, http://www-958.ibm.com/software/data/cognos/manyeyes/page/Data_Format.html,

http://www.dataviz.org/, http://www.w3.org/TR/html4/present/frames.html, http://www.dataviz.org/,

http://simile-widgets.org/exhibit/, http://people.csail.mit.edu/karger/Exhibit/CAR/, http://simile-widgets.org/wiki/Getting_Started_with_Exhibit

http://www.filamentgroup.com/lab/update_to_jquery_visualize_accessible_charts_with_html5_from_designing_with/

http://sixrevisions.com/javascript/20-fresh-javascript-data-visualization-libraries/ (extract)

http://code.google.com/apis/charttools/index.html, http://code.google.com/apis/visualization/documentation/queries.html, http://code.google.com/apis/chart/docs/gallery/dynamic_icons.html, https://chart.googleapis.com/chart?chs=75×50&cht=gom&chd=t:70&chco=FF0000,FF8040,FFFF00,00FF00,00FFFF,0000FF,800080, http://code.google.com/apis/visualization/documentation/using_overview.html, http://code.google.com/apis/chart/docs/making_charts.html, http://code.google.com/apis/visualization/documentation/using_overview.html, http://code.google.com/apis/visualization/documentation/gallery.html

http://code.google.com/p/google-refine/wiki/DocumentationForUsers,

https://www.statwing.com/, http://blogs.computerworld.com/business-intelligenceanalytics/20909/startup-aims-simplify-data-analysis, The idea behind Statwing is to provide some basic, automated statistical analysis on data that users upload to the site — correlations, frequencies, visualizations and so on — without requiring you to know when, say, to use achi-squared distributionversus a z-test.,
statistical functions: http://office.microsoft.com/en-us/excel-help/statistical-functions-HP005203066.aspx

protovis: http://vis.stanford.edu/protovis/ex/

http://vis.stanford.edu/protovis/docs/start.html

sort tools by skill levels

http://www.qgis.org/
http://tbarmann.webfactional.com/nicar/qgis_tutorial/

Learn more: Timothy Barmann of The Providence Journal posted two very useful tutorials for the CAR conference that are still available: Introduction to QGIS and The Latest in Mapping With JavaScript and jQuery. Barmann also offers a sample: Rhode Island’s Ethnic Mosaic. Another resource to help you get started: QGIS Tutorial Labs from Richard E. Plant, professor emeritus at the University of California, Davis.

Note: If you’re interested in GIS and want to consider other free software options, download this PDF listing of Open Source/Non-Commercial GIS Products. And if you’re looking for a free open-source desktop GIS program that might be fairly easy to use, Jacob Fenton, director of computer-assisted reporting at American University’s Investigative Reporting Workshop, recommends taking a look at the System for Automated Geoscientific Analyses (SAGA) site. Finally, if analyzing geographic data in a conventional database sounds interesting, PostGIS ”spatially enables” the PostgreSQLrelational database, according to the site.


http://www.arl.org/sparc/openaccess/

most of these are outdated and people don’t even use, simplifying this to the best tools is the goal, any outdated or irrelevant, deprecated solutions will be deleted.

 Google(which has a number of third-party front ends such as Map A List, an add-on that adds info to a Google Map from a spreadsheet). There’s also Yahoo Maps Web Services and Bing Maps – all with APIs. But there are numerous oth

OpenHeatMap

 How OpenHeatMap Can Help Journalists

http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis?taxonomyId=18&pageNumber=8

OpenLayers

http://fuzzytolerance.info/code/openlayers-with-a-google-street-view-widget/
http://www.geoext.org/index.html

 OpenLayers Simple Example. A good sample isUshahidi’s Haiti map.

There are other JavaScript libraries for overlaying information on maps, such as Polymaps. And there are a number of other mapping platforms, such as Google Maps, which offers numerous mapping APIsYahoo Maps Web Services, with its own APIs; the Bing Maps platform and APIs; andGeoCommons.


“Links and resources available below may be useful for those interested in pursuing open access publication or advocating for open access to others in the academic community, to grant-making institutions, or even to bodies of government. Resources supplied here include guides, presentation materials, and handbooks produced by SPARC and other organizations. These provide definitions and developments in the field, and point those interested to the growing success of Open Access. Please write to sparc[at]arl[dot]org with additions or corrections.”
http://www.worldbank.org/open/
https://openknowledge.worldbank.org/
http://www.transparency.org/

What is the Open Aid Partnership?
Transparency of development assistance, public budgets and service delivery is critical for citizen engagement. Innovative technologies, such as mapping, provide powerful new tools for strategic planning and for greater transparency and accountability. Recognizing the significant impact that these innovations and an empowered civil society can have on improving development effectiveness, the World Bank Institute and bilateral donor partners, foundations and civil society have formed an Open Aid Partnership. The Partnership will be working in close collaboration with the International Aid Transparency Initiative (IATI) and the Open Government Partnership (OGP). The partnership brings development partners together to enhance the openness and effectiveness of development assistance.

What are the Open Aid Partnership’s main objectives?

  • improve aid transparency and coordination by developing an Open Aid Map that visualizes the location of donor-financed programs at the local level;

  • better monitor the impact of development programs on citizens;

  • enhance the targeting of development programs;

  • foster accountability by empowering citizens to provide direct feedback on project results;

  • strengthen capacity of civil society and citizens to use open aid data.

http://www.openaidmap.org/

Putting Development on a Map (Mapping for Results)

The Partnership builds on the World Bank’s Mapping for ResultsInitiative, which has mapped 30,000 activities in all 143 of its client countries, and overlays these data with sub-national poverty and human development indicators at the local level. The initiative is based on the premise that the combination of visualization technologies and open data on development assistance can enable a more transparent, inclusive and effective development process.

Main Components of Open Aid Partnership:

  1. Map activities supported by development assistance and create a web-based collaborative Open Aid Map that helps improve coordination, efficiency, transparency and accountability of development assistance.

  2. Support developing countries in building national mapping platforms.

  3. Promote citizen feedback initiatives for better reporting on development assistance and public service provision in order to enhance transparency and accountability.

  4. Build capacity of civil society to act as information intermediaries for citizens and make these maps more accessible, as well as the capacity of public service providers to receive and respond to feedback.

  5. Evaluate the development impact of national mapping platforms and feedback initiatives on public services and related capacity building.

http://www.open-contracting.org/

he information available through the AidData database serves as a platform for testing new ways to make aid information more relevant for different audiences. For example, recent work on geocoding aidcan help civil society organizations identify the aid-funded activities that are underway in their communities. AidData’s work supports the efforts of the International Aid Transparency Initiative(IATI) by allowing users to download data in IATI format. Additionally, AidData Rawserves as a repository for datasets that have not yet been vetted or that are not appropriate for inclusion in the main AidData database but provide added informational value.  

http://www.aiddata.org/content/index/Services/geocoding

1) Collect, standardize, and organize geo-enabled data. Teams of experienced researchers are available to geocode project data so that it can be used for maps and other visualizations. AidData has partnered with the World Bank Institute, through the Mapping for Results initiative, and works with theAfrican Development Bank, the Kellogg Foundation, and the Malawi Ministry of Financeto geocode information.
2) Prepare visualizations and analytics that leverage the power of geo-enabled data. Once an organization has geo-enabled data, we work in partnership with Esrito visualize this information on state-of-the-art interactive maps. Custom dashboards that combine maps with graphs and charts support monitoring and evaluation efforts, and help analysts and decision-makers identify risks and define next steps.
3) Prepare implementation reports with recommendations. AidData works with organizations to determine challenges and opportunities to geo-enable their data collection and dissemination efforts, and prepare reports with actionable recommendations. Reports can include roadmaps for compliance with international data standards such as the aid information reporting standard developed by the International Aid Transparency Initiative(IATI).
4) Build and implement custom IT solutions. Based on these reports and recommendations,Development Gatewaycan help design, integrate, and implement custom geo-enabled modules and applications that extend current client systems/processes to create sustainability. Mobile applications can be developed to enable real-time data collection and increase accessibility to broader user groups. Seamlessly integrated web and mobile applications offer organizations a comprehensive way to make their work more efficient and effective.


http://openlayers.org/QuickTutorial/ [open streetmap]

TimeFlow

https://github.com/FlowingMedia/TimeFlow/wiki/Top-Tips

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=AW-0VW

http://www.wordle.net/

http://www.simile-widgets.org/timeline/

http://nodexl.codeplex.com/

http://www.analytictech.com/ucinet/

free NodeXL tutorial (PDF) or these basic step-by-step instructions on analyzing your own Facebook social network(PDF). One Facebook app for downloading your own friend information for use in NodeXL is Name Gen Web.

gephi tutorial: http://gephi.org/tutorials/gephi-tutorial-quick_start.pdf
http://thejit.org/demos/, http://thejit.org/

http://www.spatstat.org/spatstat/,

http://www.peteraldhous.com/CAR/Aldhous_CAR2011_RforStats.pdf,

http://jacobfenton.s3.amazonaws.com/R-handson.pdf,

http://cran.r-project.org/doc/manuals/R-intro.html,

http://www.r-statistics.com/tag/visualization/,
http://csvkit.readthedocs.org/
Source: http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis?taxonomyId=18&pageNumber=2

Data visualization and analysis tools

Tool

Category

Multi-purpose
visualization

Mapping

Platform

Skill
level

Data stored
or processed

Designed for
Web publishing?

Data Wrangler

Data cleaning

No

No

Browser

2

External server

No

Google Refine

Data cleaning

No

No

Browser

2

Local

No

R Project

Statistical analysis

Yes

With plugin

Linux, Mac OS X, Unix, Windows XP or later

4

Local

No

Google Fusion Tables

Visualization app/service

Yes

Yes

Browser

1

External server

Yes

Impure

Visualization app/service

Yes

No

Browser

3

Varies

Yes

Many Eyes

Visualization app/service

Yes

Limited

Browser

1

Public external server

Yes

Tableau Public

Visualization app/service

Yes

Yes

Windows

3

Public external server

Yes

VIDI

Visualization app/service

Yes

Yes

Browser

1

External server

Yes

Zoho Reports

Visualization app/service

Yes

No

Browser

2

External server

Yes

Choosel

Framework

Yes

Yes

Chrome, Firefox, Safari

4

Local or external server

Not yet

Exhibit

Library

Yes

Yes

Code editor and browser

4

Local or external server

Yes

Google Chart Tools

Library and Visualization app/service

Yes

Yes

Code editor and browser

2

Local or external server

Yes

JavaScript InfoVis Toolkit

Library

Yes

No

Code editor and browser

4

Local or external server

Yes

Protovis

Library

Yes

Yes

Code editor and browser

4

Local or external server

Yes

Quantum GIS (QGIS)

GIS/mapping: Desktop

No

Yes

Linux, Unix, Mac OS X, Windows

4

Local

With plugin

OpenHeatMap

GIS/mapping: Web

No

Yes

Browser

1

External server

Yes

OpenLayers

GIS/mapping: Web, Library

No

Yes

Code editor and browser

4

local or external server

Yes

OpenStreetMap

GIS/mapping: Web

No

Yes

Browser or desktops running Java

3

Local or external server

Yes

TimeFlow

Temporal data analysis

No

No

Desktops running Java

1

Local

No

IBM Word-Cloud Generator

Word clouds

No

No

Desktops running Java

2

Local

As image

Gephi

Network analysis

No

No

Desktops running Java

4

Local

As image

NodeXL

Network analysis

No

No

Excel 2007 and 2010 on Windows

4

Local

As image

CSVKit

CSV file analysis

No

No

Linux, Mac OS X or Linux with Python installed

3

Local

No

DataTables

Create sortable, searchable tables

No

No

Code editor and browser

3

Local or external server

Yes

FreeDive

Create sortable, searchable tables

No

No

Browser

2

External server

Yes

Highcharts*

Library

Yes

No

Code editor and browser

3

Local or external server

Yes

Mr. Data Converter

Data reformatting

No

No

Browser

1

Local or external server

No

Panda Project

Create searchable tables

No

No

Browser with Amazon EC2 or Ubuntu Linux

2

Local or external server

No

PowerPivot

Analysis and charting

Yes

No

Excel 2010 on Windows

3

Local

No

Weave

Visualization app/service

Yes

Yes

Flash-enabled browsers; Linux server on backend

4

Local or external server

Yes

Statwing

Visualization app/service

Yes

No

Browser

1

External server

Not yet

Infogr.am

Visualization app/service

Yes

Limited

Browser

1

External server

Yes

Datawrapper

Visualization app/service

Yes

No

Browser

1

Local or external server

Yes


*Highcharts is free for non-commercial use and $80 for most single-site-wide licenses.

*

  • TRAC-FM

  • TRAC-FM is a tool for governance and accountability, providing a platform that is used for instant analyzing and visualizing of the results of SMS polls that are carried out by radio or TV presenters. ​

  • More Info

  • FormHub

  • Formhub is an Android application that makes collecting survey data free and easy.

  • More Info

  • RapidSMS

  • RapidSMS is a free and open-source SMS framework for data collection, logistics coordination and communication.

  • More Info

  • uReport

  • uReport is a free SMS-based system is being used by UNICEF in Uganda to enable communities to share common issues and work together with community leaders.

  • More Info

  • Code For America Apps

  • Code For America partners with cities to develop and share apps and software that improve public services.​

  • More Info

  • Civic Commons Marketplace

  • The Civic Commons Marketplace is a space for open innovation in government, allowing cities to share software for public service delivery.

  • More Info

  • OpenStreetMap

  • OpenStreetMap (OSM), sometimes called a “Wikipedia of maps,” is a tool that allows the public to collaboratively create a map of​ the world and manipulate and download the data for free.

  • More Info

  • PoiMapper

  • Collecting and utilizing point-of-interest (POI) data cost-effectively with mobile technologies.

  • More Info

  • EpiSurveyor

  • Fast, easy and affordable mobile data collection.

  • More Info

  • Citivox

  • Turn citizen reports into actionable information.

  • More Info

  • FrontlineSMS

  • FrontlineSMS is a two-way SMS messaging system that enables direct communication between large groups and a central database.

  • More Info

  • SeeClickFix

  • SeeClickFix strengthens citizen feedback loops regarding the performance of local governments and maintenance of public assets.

  • More Info

  • Ushahidi

  • Ushahidi is a citizen feedback platform commonly used in post-disaster situations.

  • More Info

  • source: http://www.opendta.org/Pages/Tools.aspx

This paper outlines proposals for meeting
the objectives of the International Aid
Transparency Initiative (IATI) without
disproportionate cost, and explains what
value IATI would add to existing systems
for reporting aid.  Detailed work on
implementation issues is scheduled
through the IATI Technical Advisory Group
(TAG) during 2010. Membership of the
TAG is open, and so far, over 100
individuals have contributed to its work,
including representatives of each
stakeholder group.
There are many people and organisations
with diverse, legitimate and important
needs for information about aid.  
Developing country governments need
information about how aid is being spent
in their country.  Parliamentarians in
developing countries and in donor
countries want to hold their government to
account. Communities in developing
countries need to know what resources
are available for their development
priorities and in what way they can
influence how those resources are used. A
village council wants to know what aid is
available  to improve water in its area.  
Researchers need better data to
understand how aid can be more effective.  
Taxpayers want to know how their money
is being spent.
No single database can satisfactorily
meet the needs of all these potential
users.  
These users all want information tailored
to their own needs.  Often they want
information from many different donors,
combined with information from other
sources, such as the government’s
spending, or disease surveillance data.
Yet it is unrealistic to expect donors to
provide information separately to
hundreds of possible information systems.
This then is the dilemma: users need
information presented in ways specific
to their needs, but donors cannot
provide information to each of them
individually.
There are broadly two ways to respond to
this challenge. A limited response is for
those donors who currently report to the
Development Assistance Committee
(DAC)  databases to step up
the information that they already provide,
and for all donors to improve reporting to
country government aid management
systems (AIMS).
This paper sets out a more
comprehensive response and shows how
IATI could improve reporting to existing
systems, and at the same time meet a
much wider range of needs for
information, including documents as well
as data.
Donors would extend their existing
processes for collecting information about
aid, which they currently use to report to
the DAC and other systems. They would
include additional information needed by
other stakeholders, much of which is
currently collected and provided
separately.  As now, donors would choose
their own systems to manage this data
collection. They would put this combined
information into the public domain more
rapidly and in a common format.  They
would register the location of the data in a
registry”  – a kind of online catalogue
which enables users to find it.
This approach can be summarised as
publish once, use often”.  
The combination of common, open
formats plus the registry would add huge
value to the information already being
published by donors, and the additional
information they would publish as a result
of IATI, because users would be able to
access information of particular interest to
them, in a format that is useful to them,
without having to trawl round all the donor
websites individually.  This would open up
the information to a wider range of users
and democratise access to information
through services such as mobile phones
or Google.
The information collected and published
under IATI would provide the information
needed for donor reporting to existing
systems, such as DAC and country AIMS
and national budgets. This would reduce
duplicate information collection and
reporting.
To meet their commitments under the
Accra Agenda for Action (AAA), and in the
context of growing calls for government
transparency, donors are increasingly
publishing more information about aid.
Clearly this will involve some costs to
donors. These IATI proposals are
designed to minimise the additional
burden of this greater transparency, and
yet obtain the maximum benefits from their
efforts by ensuring that the information,
once collected, is universally accessible.

Based on extensive stakeholder consultation summarised in Chapter One, aindinfo concludes that the
system to implement the IATI declaration signed in Accra in 2008 should:
1. meet in full the information needs of developing country government  AIMS and budgets without
imposing a burden on developing countries, including complying with local definitions and
classifications;
2. build on the work that has been done through the DAC to develop common definitions and reporting
processes, and avoid the establishment of duplicate or parallel reporting processes;
3. produce information which is easily accessible to parliamentarians, civil society, the media and
citizens as well as to governments (in line with the expanded definition of country ownership agreed
at Accra);
4. provide accurate, high quality and meaningful information, and enable users to distinguish official
statistics, which have been professionally scrutinised, from management information about projects
and programmes;
5. include information about spending by non-DAC donors, multilateral organisations, foundations and
NGOs;
6. be easy to understand, reconcile, compare, add up, read alongside other sources of information,
and be easy to organise and present in ways that are useful to information users;
7. be legally open, with as few barriers to access and reuse as possible;
8. reduce duplicate reporting by donor agencies and minimise additional costs;
9. be electronically accessible in an open format so a wide range of third party intermediaries can
access the information and present it either as comprehensive information or niche analysis;
10. result in access to information about aid which is more timely, more detailed, more forward looking
and more comprehensive than existing data, and which includes wider information on aid, such as
key policy and appraisal documents and the outputs and outcomes it achieves

The International Aid Transparency
Initiative (IATI) was launched at the Accra
High Level Forum on Aid Effectiveness in
September 2008. IATI is a multistakeholder initiative to accelerate access
to aid information to increase
effectiveness of aid in reducing poverty.
The Accra Agenda for Action (AAA)
recognised that increased transparency is
central to the objectives of the Paris
Declaration. Transparency is essential to
meet the five underlying principles of
ownership, alignment, harmonisation,
managing for results, and mutual
accountability. The AAA expanded the
concept of country ownership to include
parliamentarians, civil society
organisations (CSOs), academics, the
media and citizens. Donors agreed to
support efforts to increase the capacity of
all development actors to play an active
role in policy dialogue. The AAA
committed donors to  “disclose regular,
detailed and timely information about our
aid flows” and to  “support information
systems for managing aid”.
IATI provides a way for donors to meet
this commitment in a coherent and
consistent way. IATI has 18 signatories, of
whom 13 are DAC members. These
signatories resolved to “give strong
political direction” and “invest the
necessary resources in accelerating the
availability of aid information”.
IATI also contributes to Cluster C on
Transparent and Responsible Aid, which
sits under the Working Party on Aid
Effectiveness (WP-EFF.) IATI has been
tasked by the Cluster with developing
reporting formats and definitions for
sharing information about aid, drawing on
the expertise of the Working Party on
Statistics (WP-STAT.) Proposals
developed by IATI will be available to
inform the Cluster’s work.
IATI aims to agree a four-part standard
consisting of:
(1) an agreement on what would be
published
(2) common definitions for sharing
information
(3) a common electronic data format  
(4) a code of conduct.
The details of what would be covered by
IATI and how this would be published will
be decided by the IATI members, following
detailed research by the Technical
Advisory Group (TAG) and consultation
with stakeholders. It is intended that the
standard will be adopted at first by IATI
members but it may over time be adopted
by other DAC donors, and by other nonDAC donors, other foundations and  nongovernmental organisations (NGOs).
There is widespread support among
developing country governments for
extending the coverage of aid information
to non-traditional donors.
IATI responds to growing demands from
civil society and citizens for greater
transparency of information about
spending and results, and for access to
key documents as well as data. The
ambitions of IATI are consistent with many
other recent initiatives to increase
transparency, for example President
Obama’s August 2009 memo on
transparency, the World Bank’s new
disclosure policy, which represents a

paradigm shift to proactive disclosure with
limited exceptions, and the development
of online information portals for citizens,
such as in Brazil.  IATI seeks to harness
the power of new technology to deliver
real improvements in the lives of the
world’s poorest people, in the  same way
that email, internet access and mobile
phone networks have revolutionised the
way that aid agencies themselves do their
business.
Since its launch in September 2008, IATI
has focused on consultation with
developing countries and CSOs, factfinding missions to a number of donor
countries, and detailed work by the TAG
on parts 1 and 4 of the proposed IATI
standard, covering an agreement on what
would be published and a code of
conduct.
The IATI Conference, held in The Hague
in October 2009, confirmed widespread
support for the objectives of IATI, and
consensus on the key information needs
of different stakeholders. At the same
time, it was clear during the IATI
conference that a number of stakeholders
would welcome greater clarity on how IATI
might  work in practice, so that they can
consider the full implications of the
initiative for their agencies.
Although detailed work on the precise
practical and technical mechanisms for
implementing IATI is only just beginning,
this paper presents a proposal on how
IATI would work, what this framework
would mean for different stakeholders, and
what added-value it is envisaged that IATI
would offer as a result.
Notes
1. http://www.whitehouse.gov/the_press_
office/TransparencyandOpenGovernm
ent/

The task that needs to be done would be gathering the correct people, agencies, non for profits, and businesses around their area of expertise to execute. From there would be so much so much data to report, analyze and gain insight from that we’ll be busy for a while before we come to general consensus on how people can contribute to the issues in the backlog of their municipal. This is a proposal to change the 2025 vision.


http://selection.datavisualization.ch/

Wolfram|Alpha Pro

[http://tributary.io/] for D3 prototyping and RStudio Server for R (which is amazing if you haven’t tried it)

what matters most is the centralization of this data in an easily scrapable and visualizable format

Geodata- the data that is used to make maps- from location of roads and buildings to topography and boundaries

Culture: data about cultural works and artefacts for example titles and authors- and generally collected and held by galleries, libraries, archives, and musuems.

Science- Data that is produced as part of scientific researcher from astronomy to zoology.

Financial- data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds, etc.)

Statistics- data produced by statistical offices such as the census and key socioeconomic indicators

Weather- the many types of information used to understand and predict the weather and climate.

Environment- information related to the natural environment such presence and level of pollutants, the quality and rivers and seas.

Transport- data such as timetables, routes, on-time statistics. (public bus statistics)

Transparency. In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized and this requires that the material be open so that it can be freely used and reused.

 

Releasing social and commercial value. In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.

 

Participation and engagement – participatory governance or for business and organizations engaging with your users and audience. Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it.

 

DDaattaa AAPPIIss

We provide a number of public APIs that expose the data in our services to developers who want

to re-use it.

okfn / ckan

The Comprehensive Knowledge Archive Network (CKAN) stores metadata and data for datasets.

Various deployments exist, but an API is available for all of them.

Documentation

Python Client

JavaScript Client

Endpoints: demo.ckan.org, datahub.io, data.gov.uk, more…

ooppeennssppeennddiinngg / ooppeennssppeennddiinngg

OpenSpending.org is a datamart for government financial data. It stores budgets and

transactional expenditure and offers search, export and aggregation APIs.

Documentation

JavaScript Client

okfn / bbiibbsseerrvveerr

BibServer is a tool for quickly and easily sharing collections of bibliographic metadata. Most of

the data stored internally can be read through the API.

Documentation

BibSoup.net instance

ppyybboossssaa / ppyybboossssaa

PyBossa is a crowd-sourcing platform where users can help to complete tasks, such as image

analysis or text transcription. The application can be completely controlled via it’s API.1/14/13 Data Sources

okf nlabs.org/data/ 2/2

JavaScript Client

Python Client

Documentation

okfn / aaccttiivviittyyaappii

The activity API collects data about project-related activties on various collaborative platforms,

including GitHub, Twitter and Mailing List feeds.

Endpoint documentation

Frontend source code

pudo / nnoommeennkkllaattuurraa

Nomenklatura is a very simplistic data linking tool. It maintains authoritative lists of names (e.g.

politicians, companies, streets) and offers an API and web-based interactive recon tool to match

variant spellings of these names to the canonical form.

Python client

Endpoint documentation

http://opendatahandbook.org/en/

1/14/13 The Open Data Handbook — Open Data Handbook

opendatahandbook.org/en/ 1/3

Open Data Handbook

The Open Data Handbook

This handbook discusses the legal, social and technical aspects of open data. It

can be used by anyone but is especially designed for those seeking to open up data. It

discusses the why, what and how of open data – why to go open, what open is, and the

how to ‘open’ data.

To get started, you may wish to look at the Introduction. You can navigate through the

report using the Table of Contents (see sidebar or below).

We warmly welcome comments on the text and will incorporate feedback as we go forward.

We also welcome contributions or suggestions for additional sections and areas to examine.

Table of Contents

Introduction

Target Audience

Credits

Credits and Copyright

Why Open Data?

What is Open Data?

What is Open?

What Data are You Talking About?

How to Open up Data

Choose Dataset(s)

Asking the community

Cost basis

Ease of release

Observe peers

Apply an Open License (Legal Openness)

Make Data Available (Technical Openness)

Online methods

Search1/14/13 The Open Data Handbook — Open Data Handbook

opendatahandbook.org/en/ 2/3

Make data discoverable

Existing tools

For government

So I’ve Opened Up Some Data, Now What?

Tell the world!

Understanding your audience

Post your material on third-party sites

Making your communications more social-media friendly

Social media

Getting folks in a room: Unconferences, Meetups and Barcamps

Making things! Hackdays, prizes and prototypes

Examples for Competitions

Conferences, Barcamps, Hackdays

Glossary

Appendices

File Formats

An Overview of File Formats

Open File Formats

How do I use a given format?

What Legal (IP) Rights Are There in Data(bases)

Indices and tables¶

Index

Search Page

An OOppeenn KKnnoowwlleeddggee FFoouunnddaattiioonn pprroojjeecctt..

©© 22001100–22001122,, OOppeenn KKnnoowwlleeddggee FFoouunnddaattiioonn. LLiicceennsseedd uunnddeerr CCrreeaattiivvee CCoommmmoonnss AAttttrriibbuuttiioonn ((UUnnppoorrtteedd)) vv33..00

LLiicceennssee

SSoouurrccee —— IIssssuueess —— MMaaiilliinngg LLiisstt —— TTwwiitttteerr @@OOKKFFNN1/14/13 The Open Data Handbook — Open Data Handbook

opendatahandbook.org/en/ 3/3

RReellaatteedd PPrroojjeeccttss:: OOppeennGGoovveerrnnmmeennttDDaattaa..oorrgg —— TThheeDDaattaaHHuubb..oorrgg —— DDaattaaCCaattaallooggss..oorrgg —— OOppeennSSppeennddiinngg..oorrgg ——

DdaattaaPPaatttteerrnnss..oorrgg

http://www.isitopendata.org/about/

with open data the rise of the citizen scientist, journalist, etc will help us achieve common goals faster  and accomplish tasks that would be too expensive or time consuming to accomplish through other means

http://www.citizencyberscience.net/

http://crowdcrafting.org/

http://opencharities.org/

http://opencorporates.com/

Get this info as JSONXMLRDF

Data Sources

Tools & Resources

http://datahub.io

Find out more about working with open data by exploring these resources:

Inclusive Planning Outreach with Web-based Tools

PlanningPress is a web toolkit for inclusive, responsive, authentic citizen engagement in transportation planning.

The web has opened up new modes of communication between governments and the public, introduced new possibilities for collaborative work, and made dynamic data visualization and analysis possible. PlanningPress makes it straightforward to apply these opportunities to community transportation planning. Everyone involved can review and engage in dialog on ideas and proposals, using maps and a user-friendly interface.

Intended for use by transportation departments and agencies, PlanningPress complements and extends the reach of an existing planning process. It enables regular, non-technical team members to publish updates. The simple content management system is built on WordPress, a widely-used publishing platform.

NYCDOT’s Jackson Heights portal is powered by PlanningPress.  The website introduces the changes proposed for the neighborhood and shows them in detail allowing residents to comment on the plans.  It lays out a timeline for events concerning the project, has an interactive mapnews updates and other resources to help people understand the project as it develops.

Open Source

Stamen is an active contributor to and author of multiple open source projects. These collaborative efforts often play a valuable role in our commercial work, and lessons learned from working for clients have a way of making their way into code releases that the public at large can benefit from.

Aight

A collection of tools for making reasonable JavaScript and CSS work in IE8.

Polymaps

Polymaps provides speedy display of multi-zoom datasets over maps, and supports a variety of visual presentations for tiled vector data, in addition to the usual cartography from OpenStreetMapCloudMadeBing, and other providers of image-based web maps.

Because Polymaps can load data at a full range of scales, it’s ideal for showing information from country level on down to states, cities, neighborhoods, and individual streets. Because Polymaps uses SVG (Scalable Vector Graphics) to display information, you can use familiar, comfortable CSS rules to define the design of your data. And because Polymaps uses the well known spherical mercator tile format for its imagery and its data, publishing information is a snap.

CityTracking

Dotspotting is the first project Stamen is releasing as part of Citytracking, a project funded by the Knight News Challenge.

We’re making tools to help people gather data about cities and make that data more legible. The code for Dotspotting is available fordownload on Github, and licensed for used under the GNU General Public License.

Modest Maps

Modest Maps is a BSD-licensed display and interaction library for tile-based maps in Flash (ActionScript 2.0 and ActionScript 3.0) and Python.

Our intent is to provide a minimal, extensible, customizable, and free display library for discriminating designers and developers who want to use interactive maps in their own projects. Modest Maps provides a core set of features in a tight, clean package, with plenty of hooks for additional functionality.

Cascadenik

Cascadenik implements cascading stylesheets for Mapnik, a Free Toolkit for developing mapping applications.

It’s an abstraction layer and preprocessor that converts special, CSS-like syntax into Mapnik-compatible style definitions. It’s easier to write complex style rules using the alternative syntax, because it allows for separation of symbolizers and provides a mechanism for inheritance.

Tile Drawer

Tile Drawer makes designing and hosting custom maps simple and straightforward. The project lets anyone run their ownOpenStreetMap server in the cloud with one-step configuration and zero administration. You can use the rendered map tiles in a number of ways: with other GIS data in OpenLayers, in a Flash application built on Modest Maps, or layered into a Google Map as a custom map tile overlay.

Walking Papers

OpenStreetMap, the wiki-style map of the world that anyone can edit, is in need of a new way to add content. Walking Papers is a way to “round trip” map data through paper, to make it easier to perform the kinds of eyes-on-the-street edits that OSM needs now the most, as well as distributing the load by making it possible for legible, easy notes to be shared and turned into real geographical data.

TileStache

TileStache is a Python-based server application that can serve up map tiles based on rendered geographic data.

You might be familiar with TileCache, the venerable open source WMS server from MetaCarta. TileStache is similar, but we hope simpler and better-suited to the needs of designers and cartographers.

http://geodjango.org/

http://postgis.refractions.net/

https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdFZPM25HN2pHUk1XSXl0RFg5YkFId0E#gid=0

http://opendatahandbook.org/

http://datawrapper.de/docs/tutorial

Computerworld - Reporters wrangle all sorts of data, from analyzing property tax valuations to mapping fatal accidents — and, here at Computerworld, for stories about IT salaries and H-1B visas. In fact, tools used by data-crunching journalists are generally useful for a wide range of other, non-journalistic tasks — and that includes software that’s been specifically designed for newsroom use. And, given the generally thrifty culture of your average newsroom, these tools often have the added appeal of little or no cost.

I came back from last year’s National Institute for Computer-Assisted Reporting (NICAR) conference with 22 free tools for data visualization and analysis – most of which are still popular and worth a look. At this year’s conference, I learned about other free (or at least inexpensive) tools for data analysis and presentation.

Want to see all the tools from last year and 2012?

For quick reference, check out our chart listing all 30 free data visualization and analysis tools.

Like that previous group of 22 tools, these range from easy enough for a beginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiring hands-on coding). Here are eight of the best:

CSVKit

What it does: This utility suite available from Christopher Groskopf’s GitHub account has a host of Unix-like command-line tools for importing, analyzing and reformatting comma-separated data files.

What’s cool: Sure, you could pull your file into Excel to examine it, but CSVKit makes it quick and easy to preview, slice and summarize.

For example, you can see all your column headers in a list — which is handy for super-wide, many-column files — and then just pull data from a few of those columns. In addition to inputting CSV files, it can import several fixed-width file formats — for example, there are libraries available for the specific fixed-width formats used by the Census Bureau and Federal Elections Commission.

Two simple commands will generate a data structure that can, in turn, be used by several SQL database formats (Mr. Data Converter handles only MySQL). The SQL code will create a table, inferring the proper data type for each field as well as the insert commands for adding data to the table.

CSVKit

CSVKit offers Unix-like command-line tools for importing, analyzing and reformatting comma-separated data files.

The Unix-like interface will be familiar to anyone who has worked on a *nix system, and makes it easy to save multiple frequently used commands in a batch file.

Drawbacks: Working on a command line means learning new text commands (not to mention the likely risk of typing errors), which might not be worthwhile unless you work with CSV files fairly often. Also, be advised that this tool suite is written in Python, so Windows users will need that installed on their system as well.

Skill level: Expert

Runs on: Any Windows, Mac or Linux system with Python installed.

Learn more: The documentation includes an easy-to-follow tutorial. There’s also a brief introductory slide presentation that was given at the NICAR conference last month.

Related tools: Google Refine is a desktop application that can do some rudimentary file analysis as well as its core task of data cleaning; and The R Project for Statistical Computing can do more powerful statistical analysis on CSV and other files.

DataTables

What it does: This popular jQuery plug-in (which was designed and created by Allan Jardine) creates sortable, searchable HTML tables from a variety of data sources — say, an existing, static HTML table, a JavaScript array, JSON or server-side SQL.

Apple device sales

Search: 

Show 102550100 entries

Quarter ending

Unit sales (millions)

Device

2010-06

3.3

iPad

2010-09

4.2

iPad

2010-12

7.3

iPad

2010-12

16.2

iPhone

2010-12

4.1

Mac

2011-03

4.7

iPad

2011-03

18.6

iPhone

2011-03

3.8

Mac

2011-06

9.3

iPad

2011-06

20.3

iPhone

Showing 1 to 10 of 17 entries

PreviousNext

Source: Apple earnings statements

What’s cool: In addition to sortable tables, results can be searched in real time (results are narrowed further with each search-entry keystroke).

Drawbacks: Search capability is fairly basic and cannot be narrowed by column or by using wildcard or Boolean searches.

Skill level: Expert

Runs on: JavaScript-enabled Web browsers

Learn more: Numerous examples on the DataTables site show many ways to use this plug-in.

FreeDive

What it does: This alpha project from the Knight Digital Media Center at UC Berkeley turns a Google Docs spreadsheet into an interactive, sortable database that can be posted on the Web.

What’s cool: In addition to text searching, you can include numerical range-based sliders. Usage is free. End users can easily create their own databases from spreadsheets without writing code.

FreeDive

FreeDive turns a Google Docs spreadsheet into an interactive, sortable database

FreeDive’s chief current attraction is the ability to create databases without programming; however, freeDive source code will be posted and available for use once the project is more mature. That could appeal to IT departments seeking a way to offer this type of service in-house, allowing end users to turn a Google Doc into a filterable, sortable Web database using the Google Visualization API, Google Query Language, JavaScript and jQuery — without needing to manually generate that code.

Drawbacks: My test application ran into some intermittent problems; for example, it wouldn’t display my data list when using the “show all records” button. This is an alpha project, and should be treated as such.

In addition, the current iteration limits spreadsheets to 10 columns and a single sheet. One column must have numbers, so this won’t work for text-only information. The search widget is currently limited to a few specific choices of fields to search, although this might increase as the project matures. (A paid service like Caspio would offer more customization.) The nine-step wizard might get cumbersome after frequent use.

Skill level: Advanced beginner.

Runs on: Current Web browsers

Learn more: The freeDive site includes several video tutorials at the bottom of the home page as well as test data to try out the wizard.

Related tools: Caspio is a well-established commercial alternative. For a JavaScript alternative with more control over the table created from a Google Docs spreadsheet, you might want to investigate Tabletop, which makes a Google Docs spreadsheet accessible to JavaScript code.

Highcharts JS

What it does: This JavaScript library from Highsoft Solutions provides an easy way to create professional-looking interactive charts for the Web. JQuery,Mootools or Prototype required.

What’s cool: With Highcharts, users can mouse over items for more details; they can also click on items in the chart legend to turn them on and off. There are many different chart types available, from basic line, bar, column and area charts to zoomable time series; each comes with six stylesheet options. Little customization is needed to get a sleek-looking chart — and charts will display on iOS and Android devices as well as on desktop browsers.

Apple device sales(millions of units)iPad salesiPhone salesMac salesDec-10Mar-11Jun-11Sep-11Dec-11010203040

Highcharts example with data about Apple device sales. Mouse over the graph to see details; click items in the legend to turn them on or off.

Drawbacks:Highcharts, like GoogleMaps, does have a distinctive look, so you may want to customize the Highcharts stylesheets so your visualizations don’t look like numerous other Highcharts on the Web. While charts displayed fine for me on an Android phone, they weren’t interactive (they were on an iPad).

And unlike most JavaScript/jQuery libraries, Highcharts is free only for non-commercial use, although a site-wide license for many companies costs only $80. (The cost jumps to $300 per developer seat in some cases — for example, if charts are customized for individual users.) Rendering can be slow in some older browsers (notably Internet Explorer 6 and 7).

Skill level: Intermediate to Expert.

Runs on: Web browsers

Learn more: The Highcharts demo gallery includes easy-to-view source code; the documentation explains other options.

Related tools: Google Chart Tools create static image charts and graphs or more interactive JavaScript-based visualizations; there are also JavaScript libraries such as Protovis and the JavaScript InfoVis ToolkitExhibit is an MIT Simile Project spinoff designed for presenting data on the Web with filtering, sorting and interactive capabilities.

Mr. Data Converter

What it does: How often do you have data in one format — while your application needs it in another? New York Times interactive graphics editor Shan Carter ran into this situation often enough that he coded a tool that converts comma- or tab-delimited data into nine different formats. It’s available as either a service on the Web or an open source tool.

Mr. Data Converter

Mr. Data Converter can generate XML, JSON, ASP/VBScript or basic HTML table formatting.

What’s cool: Mr. Data Converter can generate XML, JSON, ASP/VBScript or basic HTML table formatting as well as arrays in PHP, Python (as a dictionary) and Ruby. It will even generate MySQL code to create a table (guessing at field formats based on the data) and insert your data. If your data is in an Excel spreadsheet, you don’t need to save it as a CSV or TSV; you can just copy and paste it into the tool.

Drawbacks: Only CSV or TSV formats can be input, as well as copying and pasting in data from Excel.

Skill level: Beginner

Runs on: JavaScript-enabled Web browsers

Learn more: You can follow Mr. Data Converter onTwitter at @mrdataconverter.

Related tools: Data Wrangler is a Web-based tool that reformats data to your specifications.

Panda Project

What it does: Panda is less about analyzing or presenting data than finding it amidst the pile of standalone spreadsheets scattered around an organization. It was specifically designed for newsrooms, but could be used by any organization where individuals collect information on their desktops that would be worth sharing. Billed as a “newsroom appliance,” users can upload CSV or Excel files to Panda and then search across all available data sets or a within a single file.

Panda

Panda makes it simple to give others access to information that’s been sitting in different stand-alone spreadsheets.

What’s cool: Panda makes it simple to give others access to information that’s been sitting on individuals’ hard drives in different stand-alone spreadsheets. Even non-technical users can easily upload and search data. Search is extremely fast, usingApacheSolr.

Drawbacks: Queries are basic — you can’t specify a particular column/field to search, so a search for “Washington” would bring back items containing both the place and a person’s name. The required hosting platform is quite specific, requiring Ubuntu 11.1. (Panda’s developers have created an Amazon Community Image with the required server setup for hosting on Amazon Web Services EC2.)

Skill level: Beginner (Advanced Beginner for administration)

Runs on: Must be hosted on Amazon EC2 or a server running Ubuntu 11.10. Clients can use any Web browser.

Learn more: Panda documentation, still in the works, gives basics on setup, configuration and use. Nieman Journalism Lab has some background on the project, which was funded by a $150,000 Knight News Challenge grant.

PowerPivot

What it does: This free plugin from Microsoft allows Excel 2010 to handle massively large data sets much more efficiently than the basic version of Excel does. It also lets Excel act like a relational database by adding the capacity to truly join columns in different tables instead of relying on Excel’s somewhat cumbersome VLOOKUP command. PowerPivot includes its own formula language, Data Analysis Expressions (DAX), which has a similar syntax to Excel’s conventional formulas.

PowerPivot

PowerPivot allows Excel 2010 to handle massively large data sets more efficiently.

What’s cool: PowerPivot can handle millions of records — data sets that would usually grind PowerPivot-less Excel to a halt. And by joining tables, you can make more “intelligent” pivot tables and charts to explore and visualize large data sets with Excel’s point-and-click interface.

Drawbacks: This is limited to Excel 2010 on Windows systems. Also, SQL jocks might prefer using a true relational database for multi-table data in order to build complex data queries.

Skill level: Intermediate

Runs on: Excel 2010 on Windows only.

Learn more: There are links to demos and videos on the PowerPivot main page, as well as anintroductory tutorial on Microsoft’s TechNet.

Related tools: Zoho Reports can take data from various file formats and turn it into charts, tables and pivot tables.

Weave

What it does: This general-purpose visualization platform allows creation of interactive dashboards with multiple, related visualizations — for example, a bar chart, scatter plot and map. The open-source project was created by the University of Massachusetts at Lowell in partnership with a consortium of government agencies and is still in beta.

Weave visualization

Weave demo visualization of foreclosures in Lowell, Mass. See the interactive version.

What’s cool: The visualizations are slick and highly interactive; clicking an area in one visualization also affects others in the dashboard. The platform includes powerful statistical analysis capabilities. Users can create their own visualizations on a Weave-based Web system, or save and alter the tools and appearances of visualizations that have been publicly shared by others.

Drawbacks: Requires Flash for end-user viewing. It’s currently somewhat difficult to install, although a one-click install is scheduled for this summer. And because it’s so powerful, some users say that implementations must consider how to winnow down functionality so as not to overwhelm end users.

Skill level: Intermediate for those just creating visualizations; Expert for those implementing a Weave system.

Runs on: Flash-enabled browsers. Server requires a Java servlet container (Tomcat or GlassfishMySQL or PostgreSQL, Linux and Adobe Flex 3.6 SDK).

Learn more: The Weave site includes demos, videos and a user guide. For more examples of visualizations that can be built using a Weave platform, seeone planner’s MetroBoston DataCommon gallery. In addition, I wrote more detailed Computerworld coverage of Weave following a presentation at Northeastern University.

Related tools: Tableau Public is a robust general-purpose visualization platform.

Show Me the Numbers: Designing Tables and G…

Visualize This: The FlowingData Guide to De…
by Nathan Yau
$23.73

Visual Thinking: for Design (Morgan Kaufman…
by Colin Ware
$36.68

The Wall Street Journal Guide to Informatio…
by Dona M. Wong
$18.15

Beautiful Visualization: Looking at Data th…
$50.46

The Visual Display of Quantitative Informat…
by Edward R. Tufte
$25.26

The Visual Miscellaneum: A Colorful Guide t…
by David McCandless
$17.81

The Functional Art: An introduction to info…
by Alberto Cairo
$26.06

slide:ology: The Art and Science of Creatin…
by Nancy Duarte
$19.52

Now You See It: Simple Visualization Techni…
by Stephen Few
$25.97

Information Dashboard Design: The Effective…
by Stephen Few
$20.13

Designing Data Visualizations
by Noah Iliinsky
$19.29

Beautiful Evidence
by Edward R. Tufte
$37.34

Envisioning Information
by Edward R. Tufte
$34.05

Visual Explanations: Images and Quantities,…
by Edward R. Tufte
$30.24

Information Visualization, Third Edition: P…
by Colin Ware
$43.65

Resonate: Present Visual Stories that Trans…
by Nancy Duarte
$18.54

Presentation Zen: Simple Ideas on Presentat…
by Garr Reynolds
$18.39

Consider Your Message When Choosing What Chart to Use

Forbes contributor Naomi Robbins on the different types of charts and their uses for emphasising a particular message.

Read more

 

Datawrapper 1.0 Is Released

Following a successful beta version, Datawrapper version 1.0 is released.

Read more

 

Verification Tools for Journalists

A list tools for verifying people, places and images, from EmergencyJournalism.net.

Read more

 

LearnStreet: Coding Starts Here

LearnStreet, a new California-based start-up, aims to change the code learning process.

Read more

 

Must Zero Be Included on Scales of Graphs? Another Look at Fox News’ Graph and Huff’s Gee-Whiz Graph

A post by Forbes contributor Naomi Robbins on the use of zero baselines in graphs by the media.

Read more

 

Comparing Graphics from The Guardian and The New York Times: A Project by Marije Rooze

A recent project by Dutch MA student Marije Rooze compares interactive graphics from the Guardian and the New York Times, with unexpected results.

Read more

 

Some Useful Statistical Blogs

Forbes.com contributor, Naomi Robbins, shares some respected stats blogs and recent discoveries.

Read more

 

The Journalist’s ‘Learn to Code’ Resource Guide

This is a list of resources you can use to begin to write your own programs, written with journalists in mind.

Read more

 

Torque: An Open Source Mapping Tool for Big Data by CartoDB

A new visualisation tool for Big Data brought to you by the people at CartoDB.

Read more

 

Statwing: Powerful Data Analysis, Simple to Use

Given the rising interest in data analytics, there will be new tools. Here is one fresh and promising approach.

Read more

 

The KoBo Platform: Handheld Data Collection for Real Practitioners

Introducing KoBo, an integrated suite of applications for handheld data collection that are specifically designed for a non-technical audience.

Read more

 

Tips for Working with Numbers in the News

The best tip for handling data is to enjoy yourself. Data can appear forbidding. But allow it to intimidate you and you’ll get nowhere. Treat it as something to play with and explore and it will often yield secrets and stories with surprising ease.

Read more

 

Geofeedia: Next Generation Crisis Mapping Technology?

Situational awareness is absolutely key to emergency response, hence the rise of crisis mapping. In this post we introduce Geofeedia and discuss its potential applications for humanitarian response.

Read more

 

Using Data Visualization to Find Insights in Data

In order to be able to see and make any sense of data, we need to visualize it. Data visualisation expert Gregor Aisch explains the steps you need to take in order to make finding insights in data more effective.

Read more

 

How to Create the Perfect Line Chart

Award-winning data visualisation expert Gregor Aisch outlines the best practices for creating line charts.

Read more

 

Getting Data: A Five Minute Field Guide

Looking for data on a particular topic or issue? Not sure what exists or where to find it? Don’t know where to start? This post shows how to get started with finding public data sources on the web.

Read more

 

Video: School of Data Journalism – Precision Journalism Workshop

This video will guide you through the very basics of how to use Excel for data journalism projects.

Read more

 

Video: School of Data Journalism – Spending Stories workshop

An Open Knowledge Foundation project helps journalists to find stories in spending data.

Read more

 

Meet data mapping platform CartoDB

Introducing CartoDB – an open source, cloud-based data mapping platform which makes mapping accessible, even for complete beginners.

Read more

 

Delivering data: How to build a news app

A look at what news apps are and what they do, when it is useful to build them and how to build them.

Read more

 

Video: The joy of stats with Hans Rosling

Join Prof. Rosling on an exciting trip through the history and development of one of the pillars of data journalism: good old statistics.

Read more

 

Video: School of Data Journalism – Making Data Pretty workshop

An effective visualisation is the key element to engage your audience around data projects. This video from the School of Data Journalism explains the secrets of the trade.

Read more

 

Video: School of Data Journalism – Getting Stories from Data workshop

Caelainn Barr and Steve Doig explain how to turn public data sets into a goldmine of information.

Read more

 

Video: School of Data Journalism – Information Wants to Be Free workshop

The second workshop of the Data Journalism School in Perugia looked at how journalists can use open data and Freedom of Information legislation to get access to the information they need.

Read more

 

Introduction to open-source GIS tools for journalists

Location is quickly becoming a core value of journalism and geographic literacy is on the rise. A look at geocoding tools.

Read more

 

Creating dot density maps with Chicago Tribune’s new open source toolkit

Chicago Tribune hacker Christopher Groskopf explains the tools and techniques behind the creation of dot density maps with U.S. census data.

Read more

 

The limitations of red-green colour scales in infographics

Information visualization expert Gregor Aisch explains the end of his love story with diverging red-green colour scales.

Read more

 

Beginner’s guide for journalists who want to understand API documentation

There are three letters that have been floating around the media world for several years now: API. There aren’t many resources that explain API documentation to non-coders. Here’s an overview of how to figure it out.

Read more

 

Reading data from Flash sites

Adobe Flash can make data difficult to extract. This tutorial will teach you how to find and examine raw data files that are sent to your web browser, without worrying how the data is visually displayed.

Read more

 

Essential visualisation resources: Tools for analysis, collection and enterprise

This is the first part of a multi-part series designed to share with readers an inspiring collection of the most important, effective, useful and practical data visualisation resources. The series will cover visualisation tools, resources for sourcing…

Read more

 

Essential visualisation resources: Tools for mapping

This is the fourth part of a multi-part series designed to share with readers an inspiring collection of the most important, effective, useful and practical data visualisation resources. The series will cover visualisation tools, resources for sourcing…

Read more

 

Power tools for aspiring data journalists: Funnel Plots in R

In the following post Tony Hirst describes a quick way of analysing a mortality dataset using R, a very powerful statistical programming environment that should probably be part of your toolkit if you ever want to get round to doing some serious stats…

Read more

 

The top 10 data-mining links of 2011

Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We’ve written before about the goals of the project, and we’re developing some new technology, but mostly we’re…

Read more

 

A computational journalism reading list

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of “programmer…

Read more

 

The Bastards Book of Ruby

The Bastards Book of Ruby is an introduction to programming for non-programmers. The online book focuses on the use of programming for the gathering, organizing, and analyzing of data in all its forms.

Read more

 

Programmer-journalist job openings

A spreadsheet listing over 50 programmer-journalist jobs has been circulating online for some time now. All the jobs require technical skills and range from newsroom developer to interactive designer, multimedia producer and social media editor.

Read more

 

Getting text out of an image-only PDF

In the previous guide, we describe several methods for turning PDFs into data usable for spreadsheets. However, those only handle PDFs that have actual text embedded within them. When a PDF contains just images of text, as they do in scanned documents,…

Read more

 

Turning PDFs to text

Adobe’s Portable Document Format is a great format for digital documents when it’s important to maintain the layout of the original format. However, it’s a document format and not a data format.

Read more

 

Using Google Refine to clean messy data

Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a “power tool for working with messy data” but could very well be advertised as “remedy for eye fatigue, migraines, depression, and other symptoms of…

Read more

 

Manual on Excel for data journalists

The Centre for Investigative Journalism came out with a handbook this year for journalists who want to master the art of interrogating and questioning numbers competently.

Read more

 

Tableau Public

Tableau Public is a data visualisation tool that enables users to condense complex datasets into simple and easy to read graphs, which allow for better understanding of the datasets.

Read more

 

Where are the bodies buried on the web? Big data for journalists

The following post is the introduction to the free online ebook ‘Where are the bodies buried on the web? Big data for journalists’ published by former Apple engineer Pete Warden in January this year.

Read more

 

10 tools that can help data journalists do better work, be more efficient

It’s hard to be equally good at all of the tasks that fall under data journalism. To make matters worse (or better, really), data journalists are discovering and applying new methods and tools all the time. As a beginning data journalist, you’ll want…

Read more

 

How to scrape Toronto data: a basic tutorial

This post is a step-by-step tutorial on scraping for beginners with video clips.

Read more

 

Visualizing Toronto’s water usage: a tutorial

This post is a tutorial on data visualisation for those who are just starting out. You will learn how to take a big data file, clean it, filter it and turn it into a visualisation.

Read more

 

List of tutorials for journalists on how to use spreadsheets

This post is a list of the best free tutorials on the web for journalists who want to learn spreadsheet skills.

Read more

 

Video archive EJC @PICNIC11: From database cities to urban stories (II)

In this post you can find the videos of the talks from the second European Journalism Centre session: ‘From database cities to urban stories: What are the success stories?’, at the 2011 edition of the leading media festival PICNIC in Amsterdam.

Read more

 

How to Find Stories in EU Spending Data

Caelainn Barr, EU data journalist, talks about how to find stories in EU spending data at the EJC/OKF data driven journalism workshop in Utrecht in September.

Read more

 

Video archive EJC @PICNIC11: From database cities to urban stories (I)

In this post you can find the videos of the talks from the first European Journalism Centre session: ‘Using technology to run our cities: promises and perils’, at the 2011 edition of the leading media festival PICNIC.

Read more

 

BuzzData

There’s a buzz going around about the new data-sharing hub. BuzzData, the new social network for open source data.

Read more

 

Google Public Data Explorer

Released in August 2010, Google’s Public Data Explorer makes public data and statistics easier to understand and share.

Read more

 

The Guardian Data Store

The Guardian Data Store is an online directory providing a selection of datasets on topics of public interest and tools to explore them, along with demonstrations of original or guest visualisations of the datasets.

Read more

source: http://datadrivenjournalism.net/resources

http://www.learnstreet.com/

spreadsheets:

Knight Digital Media Center tutorial on spreadsheets. This is a useful and detailed tutorial for absolute beginners. It has 25 sections explaining among others how to import data in a spreadsheet, how to use formulas and how to format cells.

  • Forjournalists.com Excel basics and advanced features, tips and tricks and a four-part Excel training course (towards intermediate/advanced level).

  • McGill University guide to exporting a table from a PDF file into an excel spreadsheet. 

Nokogiri, an XML parsing library for Ruby

Instead of using Firebug, you can also use Safari’s built-in Activity window, or Chrome’s Developer Tools, for the inspection part. To parse the result, we use Ruby and Nokogiri, which is an essential library for any kind of web scraping with Ruby.

Series of Tubes…and Files

While the site makes the data difficult to download, it’s not impossible. In fact, it’s fairly easy with some understanding of web browser interaction. The content of a web page doesn’t consist of a single file. For instance, images are downloaded separately from the webpage’s HTML.

Flash applications are also discrete files, and sometimes they act as shells for data that come in separate text files, all of which is downloaded by the browser when visiting Cephalon’s page. So, while Cephalon designed a Flash application to format and display its payments list, we can just view the list as raw text.

cephalon-w-firebug-full

Viewing Cephalon’s page. The Firebug panel is circled

Firebug can tell you what files your browser is receiving. In Firefox, open up Firebug by clicking on the bug icon on the status bar, then click on the Net panel. This panel shows every file that was received by your web browser when it accessed Cephalon’s page.

cephalon-firebug-closeup

Close-up of the Firebug panel. The Net tab is circled in yellow, the relevant .swf file is circled in green.

We know we’re looking for the Flash file, so let’s look for that first. Flash applets use the suffix swf. The only one listed is spend_data.swf. In Firebug, right-click on the listing, copy the url, and paste it into a new browser window:

http://www.cephalon.com/Media/flash/spend_data-2009.swf


You can see the Flash file in its context here: 
http://www.cephalon.com/our-responsibility/relationships-with-healthcare-professionals/archive/2009-fees-for-services.html.

You’ll get a larger-screen view of the list, though that doesn’t really help our data analysis. As you may have noticed in the Firebug Net panel, spend_data.swf is less than 45 kilobytes, which doesn’t seem large enough to contain the entire list of doctors and payments. So where is the actual data stored?

Sniffing Out the Data

Here’s how find it: First, clear your cache in Firefox by going to Tools->Clear Recent History and selecting Cache. With Firebug still open, refresh the browser window that has spend_data.swfopen.

Screen_shot_2012-03-04_at_12.05.11_PM.png

Relevant XML file is circled here.

Firebug’s window tells us that besides receiving spend_data.swf, our browser downloaded two xml files. One of these is more than 100 kilobytes, which is about what we would expect for an XML-formatted list of a few hundred doctors.

Now right-click on the file in Firebug and select Open in New Tab, and then View Page Sourceby right-clicking in the new tab. You should see a text file full of entries like the following:

Screen_shot_2012-03-04_at_11.53.11_AM.png

That’s what we were looking for: a well-structured list of the doctors and what they got paid. Now it’s a simple matter of using an xml parser, like Ruby’s Nokogiri, to iterate through each “row” node and pick up the essential values.

Parsing with Nokogiri

The following is a brief example of Nokogiri‘s most basic methods. It assumes you have Ruby andNokogiri installed, and a little familiarity of basic programming.

The two Nokogiri methods we’re most interested in are:

  • css – this lets us select tags inside XML and HTML documents. In this example, we want thevalue and row tags.

  • text – with each element returned by csstext will give us the actual characters enclosed by the element’s tags.

Each row represents a record, and each value represents a datafield, like name and location. So, we simply want to read each row and select the values we’re interested in.

Screen_shot_2012-03-04_at_11.58.24_AM.png

Here’s a compact variation of the above code that writes the result into a file:

Screen_shot_2012-03-04_at_11.59.21_AM.png

So, what first appeared to be the most difficult report to parse ends up being the easiest. Whether you’re dealing with a Flash application or a HTML database-backed website, your first step should be to see what text files your browser receives when accessing the page.

The fundamental question: What can this API do for me?

Look for mentions of the word “requests.” If you don’t see that, look for the words “REST API,” or something that looks like the latter part of a URL.

Within those sections, look for the words “get” and “post.” These are called methods, the specific actions the API can do. (Some developers will quibble and call them functions. For this tutorial, we’ll stick to methods.)

If the documentation is written in plain English, it will be easy to understand what the method is doing. If not, you’ll need someone with more coding experience to help interpret what’s going on. But know this:

Get” asks for something from the API server — as in, GET me the number of times an address shows up in the database.

Post” changes the database by creating, adding or removing something from it — as in POST a new address to the database.

In what format can I get the data?

An API usually lets you choose how the data will come back to you, also known as the response format. You’ll usually see “json” or “XML.” Sometimes, you’ll see “txt” or other formats. The format is best decided by your developer, but at least you’ll know what’s available.

To find format options, search for the word “format” or “response.” Sometimes the format is mentioned at the start of documentation; sometimes, you’ll find “format” in the methods.

What does the API need in exchange for what I want?

Sometimes you can make a API request or post without identifying yourself. But API creators often want to know how the API is being used and by whom. In addition, they want to prevent server overload and head off developer hijinks, so many APIs require a key — an ID unique to the person or program making a request.

Getting a key is generally straightforward. Look for the word “authentication,” “API key” or “APIkey” to get the instructions, and to see which methods (which “gets” and “posts”) require authentication.

Can I test API requests even if I’m not a developer?

Yes. You can build your own test request by copying the example response found in the method and changing the variables, usually referred to as parameters.

For example, let’s try getting New York Times reviews for the “Harry Potter” movies as an XML-formatted response. Use your favorite search engine to findThe New York Times movie reviews API. This API is not perfect (it’s in beta, after all). The steps below can be compressed with shortcuts once you become more experienced, but since we’re assuming this is your first time, we’re going to take the slow road.

Once you’re on the API page:

1) Look for something that allows you to get reviews using keywords. In this case, that’s the “Reviews by Keyword” method. Within the method description is a URI example (the text in the gray box). That’s the template for your request.

Copy it, paste it into a text editor [TextWrangler (Mac), TextMate (Mac) orTextPad (Windows)] and start replacing the parameters, the things in braces and brackets. They’re bolded below for easy reference.

In the Reviews by Keyword method, there are two required parameters:version which is the API version (use v2), and API-key, which you can getright here.

You’d go from this:
http://api.nytimes.com/svc/movies/{version}/reviews/search[.response_format]?[optional-param1=value1]&[...]&api-key={your-API-key}

To this:
http://api.nytimes.com/svc/movies/v2/reviews/search[.response_format]?[optional-param1=value1]&[...]&api-key={paste your API key and here and delete the surrounding braces}

2) Next, set up two additional parameters, which are described a little further down in the same section of the Movie Reviews API documentation:

  • The response-format, which will be .xml

  • A keyword query — we’ll use query=Potter because searching for ‘Harry+Potter’ doesn’t work. (I know because I tried. Remember, the API is in beta.)

  • An opening-date range, from the first film (which came out in November 2001) through the last film (which comes out this week). As the documentation tells you, the format for a range is YYYY-MM-DD;YYYY-MM-DD, so we’ll use opening-date=2001-11-01;2011-07-31

Your URI example should now look like this (the new parameters are in bold):
http://api.nytimes.com/svc/movies/v2/reviews/search.xml?&query=Potter&opening-date=2001-11-01;2011-07-31&api-key={paste your API key and delete the surrounding French braces}

3) Copy and paste the URI you made into a Web browser address bar. Hit return.

If you made the changes correctly, you’ll get a response similar to what’s on this page. In fact, if you want, you can copy the URI above up to the = before the {, paste it into your browser’s address bar, and add your API key to the end and hit return to see the XML output.

Voilà. You’ve just made your first API call and pulled New York Times “Harry Potter” movie reviews. (Plus a straggler. Again, beta.)

Some API developers are nice enough to include a consolesandbox or fill-in-the-blank form so you can test your requests without hand-building them. Better yet, the tools usually generate both the properly formatted request and the result, which you and your developers can then copy and paste and use as you wish.

You will come across lots of documentation styles as you begin to explore what’s available to you. If you have questions about what you find, feel free to ask them on the Hacks/Hackers help board.


5

R Specifically the lattice & ggplot2 libraries

Learn the power of condition on a categorical variable to make multiple plots:

xyplot( numphone ~ year | country, data=wp, type="b", scales="free")

Full Example (gist)

The key here is the “| country” notation which “conditions” on country to create one plot for each country in the dataset. This can be a great way to rapidly explore a dataset of counties, states, school districts, whatever. You decide what you want on your x and y axes and then condition across an appropriate conditioning variable in one line of code you can have thousands of plots output showing how your x&y vary across categories. This is the simplest case, there is much more power under the hood to unleash.

To do multiple pages and store to pdf just do:

pdf('~/path/to/mydoc.pdf', width=11, height=8.5)
xyplot( numphone ~ year | country, data=wp, type="b", scales="free", layout=c(1,1) )
dev.off()

You’ll then get one page for as many countries as are in the dataset.

ggplot2 has a similar feature known as “faceting” the data.

This concept of “conditioning” and “faceting” can let you slice massive datasets into digestible morsels very rapidly.

jonahduckles

50

  1. Yes yes yes. After using it for a few weeks I can’t believe I forgot to mention R + lattice. Its tremendously powerful. +1

 

Data Sources

Although we gather some information by hand, GovTrack pulls much of the information you see from a variety of other sources. We also make available the information we collect in a normalized XML format for other projects to reuse. For more information on that, see theDeveloper Documentation.

Some information comes from these official government sources:

  • The Library of Congress and the Congressional Research Service via THOMAS.gov for the status legislation, subject terms of bills, bill summaries, and upcoming House committee meetings (from the Daily Digest). We have been actively campaigning Congress and the Library of Congress to publish this information in a structured data format. Until then, we “screen-scrape” their web pages, extracting the information in a semi-reliable automated way.

  • The House of Representatives and the Senate for information on Members of Congress, committee membership, voting records, upcoming bills, and upcoming committee meetings.

  • The House Majority Leader’s docs.house.gov website for legislation scheduled for the week ahead.

  • The Government Printing Office for the text of legislation and photos of members of congress from the Congressional Pictorial Directory.

  • The Congressional Biographical Directory for biographical and historical information on members of Congress.

  • The Census Bureau for geographic data on congressional districts.

Other federal information comes from:

State legislative information comes from:

Source: http://www.govtrack.us/sources

Here’s a list of websites using data assembled by GovTrack

http://palewi.re/posts/2008/04/20/python-recipe-grab-a-page-scrape-a-table-download-a-file/

http://www.poynter.org/how-tos/digital-strategies/146263/introduction-to-open-source-gis-tools-for-journalists/

http://www.texastribune.org/library/data/

http://overview.ap.org/

https://scraperwiki.com/

http://openblockproject.org/

document cloud

http://www.publiclaboratory.org/home

Apache Accumulo.

 InfoChimps offers a big data stack managed as a service within private data centers. For those content to run in the public cloud, Qubole takes the concept one level further, with a turnkey Hadoop and Hive analysis platform that runs on Amazon EC2.

One to watch: new entries into enterprise Hadoop infrastructure will include WANdisco

Berkeley Data Analytics Stack offers an alternative platform that performs much faster than Hadoop MapReduce for some applications focused on data mining and machine learning.

At the same time, Hadoop is reinventing itself. Hadoop distributions this year will embrace Hadoop 2.0, and in particular YARN

http://strata.oreilly.com/2013/01/u-s-house-open-data-open-government.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+oreilly%2Fstrata+%28O%27Reilly+Strata%29

If you’d like to learn more about the possible uses of this tool, I’d just recommend the materials linked in the blog post above (which I’ve also listed below): An introduction to Paper Machines, by Chris Johnson Roberson”; Understanding Paper Machines,” by Jo Guldi; “Supercharge Your Zotero Library Using Paper Machines, Part I,” by Sarita Alami; and “Supercharge Your Zotero Library Using Paper Machines, Part II,” by Sarita Alami

http://pinboard.in/

http://quibb.com/

http://launch.co/#/rooms/Ticker

http://www.news.me/

https://hackpad.com/0PLD#QuantifiedSelf-Berlin

https://github.com/explore)

http://www.barebones.com/products/yojimbo/

http://getcloudapp.com/

https://workflowy.com/

http://measuredvoice.com/

https://medium.com/

http://getlittlebird.com/

https://www.rebelmouse.com/

https://clipboard.com/

https://www.spundge.com/

http://www.gocast.it/

http://thinkupapp.com/

http://getprismatic.com/

https://www.gittip.com/

https://habitrpg.aws.af.cm/

In the previous blog post we explained why we think Open311 is a good idea. In this post we’ll explain what it actually does.

Open311 is very simple, but because it’s fundamentally a technical thing it’s usually explained from a technical point of view. So this post describes what Open311 does without the nerdy language (but with some nerdy references for good measure). At the end there’s a round-up of the terms so you can see how it fits in with the actual specification.

We’re using an unusual example here — a blue cat stuck up a tree — to show how applicable Open311 is to a wide range of problems. Or, to put it another way, this is not just about potholes.

Cat up a tree and an Open311 robot

So… someone has a problem they want to report (for this discussion, she’s using a service like FixMyStreet).

There’s one place where that report needs to be sent (in the UK, that’s your council). That administrative body (the council) almost certainly has a database full of problems which only their staff can access.

I have a problem :–(

the “client”

I fix problems!

the “server”

In this example, FixMyStreet is an Open311 client and the council is an Open311 server. The server is available over HTTP(S), so the client can access it, and the server itself connects to the council’s database. In reality it’s a little bit more complicated than that (for now we’ll ignore clients that implement only part of Open311, multiple servers, and decent security around these connections), but that is the gist of it.

Although it’s not technically correct to confuse the client with the user, or the server with the council, it makes things a lot easier to see it this way, so we’ll use those terms throughout.

Service discovery

To start things off, the client can ask the server: what services do you provide?

Until the client has asked the server what problems it can fix, it can’t sensibly request any of them.

What services do you offer?

I can:
POT: fix potholes
TELE: clean public teleports
PET: get pets down from trees
JET: renew jetpack licenses …

FixMyStreet can use the response it gets from such a service discovery to offer different categories to people reporting problems. We actually put them into the drop-down menu that appears on the report-a-problem page.

In the Open311 API, this is handled by GET Service List. Each service has its own service_codewhich the client must use when requesting it. Note that these services and their codes are decided by the server; they are not defined by the Open311 specification. This means that service discovery can easily fit around whatever services the council already offers. The list of services can (and does) vary widely from one council to the next.

Service definitions

Some services require specific information when they are requested. For example, it might be important to know how deep a pothole is, but it’s not relevant for a streetlight repair.

Tell me more about the PET service!

I can get pets down from trees, but when you request the service, you *must* tell me what kind of animal the pet is, OK?

In the Open311 API, this is handled by the GET Service Definition method. It’s not necessary for a simple Open311 implementation. In fact, it only makes sense if the service discovery explicitly told the client to ask about the extra details, which the server does by adding metadata="true" to its response for a given service.

Requesting a service

This is where it gets useful. The client can request a service: this really means they can report a problem to the server for the body to deal with. Some submissions can be automatically rejected:

My hoverboots are broken :–( I need BOOT service!

404: Bzzzt error! I don’t fix hoverboots (use service discovery to see what I *do* fix)

Hey! Blueblue is up a tree! I need PET service (for cats)!

400: error! You forgot to tell me where it is.

If the report is in good order, it will be accepted into the system. Open311 insists that every problem has a location. In practice this is usually the exact position, coordinates on planet Earth, of the pin that the reporter placed on the map in the client application (in this case FixMyStreet.com).

I need PET service (for cats)! Blueblue is stuck up the biggest tree in the park :–(

200: OK, got it… the unique ID for your request is now 981276

In the Open311 API, this is handled by POST Service Request. You need an API key to do this, which simply means the server needs to know which client this is. Sometimes it makes sense for the server to have additional security such as IP address restriction, and login criteria that’s handled by the machines (not the user).

Listing known requests

The server doesn’t keep its reports secret: if asked, it will list them out. The client can ask for a specific report (using the ID that the server gave when the report was submitted, for example) or for a range of dates.

Did anyone ask you for help yesterday?

Yes, I got two requests:

request 981299: TELE dirty teleport at the cantina (I’m waiting for a new brush)

request 971723: POT pothole at the junction of Kirk and Solo (I filled it in)

In the Open311 API this is handled by GET Service Request(s). The client can indicate which requests should be listed by specifying the required service request id, service code, start date, end date or status.

Does Open311 work?

Oh yes. On the Open311 website, you can see the growing list of places, organisations, and suppliers who are using it.

The technical bit

In a nutshell: Open311 responds to HTTP requests with XML data (and JSON, if it’s wanted). There’s no messing around with SOAP and failures are reported as the HTTP status code with details provided in the content body.

You can see the specification for Open311 (GeoReport v2). It doesn’t feature blue cats, but if you look at the XML examples you’ll be able to recognise the same interaction described here. And remember the specification is an open standard, which means anyone can (and, we think, should) implement it when connecting a client and server in order to request civic services.

Coming next…

In the next blog post we’ll look at how FixMyStreet uses Open311 to integrate with local council systems, and explain why we’re proposing, and utilising, some additions to the Open311 specification.

1) Building credibility (and engagement) in digital news

Most news publications are not as transparent about their own reporters and their sources as they could be, and many don’t report retroactively on whether pundits/sources got things right or wrong. Notable exceptions, like Wikipedia which footnotes all entries, have become very trusted (and popular) sources of information. How can news orgs move towards embedding more credibility into news? Also, can news animations be credible?

2) Improving content recommendations

Currently content is largely recommended based on relevance (you are reading about bridges, here’s another article about bridges), social context (your friends are reading about bridges, read this too) or editorial selection (Our editor believes you need to know about bridges, read this). What other ways could be available to enable discovery of content? How can we broaden the perspectives being shared?

3) Freeing news content from news sites (in a big big way)

Only 1.1% of total page views online take place on news sites (even less on mobile). If content is restricted to news sites, digital ad revenue will remain small. Can news makers develop a standard to allow news to be distributed as easily across web and mobile as ads and preserve some benefit for content creators?

4) Scaling media-focused social enterprises  (while safeguarding mission)

Media companies have traditionally had a source of revenue (advertising) that is not directly tied to content or customers. New, socially-focused media companies (PolicyMicUpworthyZeega)  have ads as well as other revenue sources (lead generation, content creation). How best to balance these and other potential revenues against the needs of the people whose lives they are trying to change?

5) Making data visualization work on mobile

Data visualizations are 30x as likely to be shared as traditional text articles, and have become an important part of the news landscape. But as more news gets read on phones, the impact of these visualizations is mitigated. Can we develop standard practices to maximize the impact of data visualizations on mobile phones?

Because of the nature of the conference, it is entirely possible that another news-fooer identified a completely different series of issues that folks are focused on. I look forward to hearing these as Knight gathers more perspectives on the conference.

http://alpha.zeega.org/

http://www.upworthy.com/

http://www.policymic.com

http://watchup.com/

We want anyone who comes to Syria Deeply to walk away smarter and better informed about what’s happening in our world. We’re fielding your feedback and story ideas through info@syriadeeply.org.

http://transterramedia.com/

http://www.submittable.com/

http://www.branch.com/

OpenBlock Rural, which builds on the OpenPlans project OpenBlock to aggregate news and data using the system that powers EveryBlock (got that?);

  • SwiftRiver, a tool to manage data (like tweets) in real-time;

  • FrontlineSMS, a large-scale text messaging platform for non-governmental organizations.

https://www.umbel.com/

all social changes and projects

Links to Content Below: General | Innovating Media | Engaging Communities | Fostering Arts

General Resources

Foundation Center – Tools and Resources for Assessing Social Impact (TRASI)
This 
database contains approaches to impact assessment, guidelines for creating and conducting an assessment, and actionable tools for measuring social change.

Urban Institute – Outcomes Indicators Project
This 
resource supports nonprofit performance tracking by suggesting outcomes and indicators to assist nonprofit develop new measurement approaches and enhance existing systems.

Grantcraft
This 
website provides materials that offer insights and approaches to improve the effectiveness of social sector organizations, including several guides on evaluation and assessment.

Innovation Network – Point K Learning Center
Point K’s 
resources offers a set of tools, including an Organizational Assessment Tool, Logic Model Builder and Evaluation Plan Builder, to support non-profits in designing and implementing assessments for their own programs.

Root Cause – Building a Performance Measurement System
This 
guide provides a practical, five-step process for developing a performance measurement approach to support nonprofits as they select measures, design reports, and communicate impact.

W.K. Kellogg Foundation – Evaluation Handbook & Logic Model Development Guide
This 
workbook provides a framework for approaching nonprofit program evaluations that support program performance.  The guide introduces the logic model tool to nonprofits seeking to strengthen program design and delivery, and disseminate results.

Innovating Media & Information Resources

IMPACT: A Practical Guide to Evaluating Community Information Projects
This 
guide, produced by Knight Foundation and FSG Social Impacts Advisors, supports organizations to collect information about the effectiveness and impact of their community news, information and media projects.

Measuring the Online Impact of Your Information Project
This 
report, produced by Knight Foundation, FSG Social Impact Advisors and journalism professor Dana Chinn, outlines how funders and their grant partners reach and engage online audiences. It identifies metrics for measuring impact for projects creating informed and engaged communities and includes a set of useful examples.

Center for Social Impact (American University) & The Media Consortium – Investing in Impact
This 
paper outlines reasons to assess public interest media, synthesizes primary evaluation needs, and proposes news tools to assist those involved in public interest media track their work.

Center for International Media Activists – Planning and Evaluation for Media Activists

This resource outlines recommendations, case studies and tools for performing strategic planning and evaluation for media justice projects.

Engaging Communities

Evidence of Change: Exploring Civic Engagement Evaluation – Building Movement Project
The 
report presents a brief summary of key findings from the 2010 Civic Engagement Evaluation Summit. 

Community Tool Box – University of Kansas
This online 
toolkit offers extensive information about approaches to building healthy communities, including guides for evaluating community programs and initiatives.

Center for Information and Research on Civic Learning and Engagement – Tufts University
The 
center conducts research on the civic and political engagement of young Americans and offers several research and evaluation tools for gauging civic engagement.

Harvard Family Research Project – The Evaluation Exchange
This
periodical regularly provides lessons and emerging best practices for evaluating programs and policies, specifically those focused on children, families, and communities.

Fostering Arts

IMPACT Arts 
This repository provides resources for those working in the arts who want to understand the social impact of their projects.

Grantmakers in the Arts – Digest: Studies, Books, Web Sites
This 
resource showcases publications and tools for informing strategy and assessment of arts funders and their nonprofit partners in the field.

Boston Youth Arts Evaluation Project
This 
project, a collaboration between nonprofits working in youth arts and national leaders in research, promotes innovative evaluation methods and tools for measuring youth arts.

Yahoo! Style Guide

“Learn how to write and edit for a global audience through best practices from Yahoo!”

Beginning Reporting

A website for beginning reporters, those studying the craft and their teachers.

Digital Journalist Survival Guide: A Glossary of Tech Terms You Should Know

A comprehensive glossary of terms associated with Internet journalism. Terms every digitally enthused journalist should know.

My High School Journalism

My High School Journalism describes itself as the world’s largest host of teen-generated news.

Citizen Media Law Project

The Citizen Media Law Project is a new pro bono initiative hosted by the Berkman Center for Internet & Society at Harvard University.

Knight Digital Media Center

The Knight Digital Media Center is a partnership between the Annenberg School for Communication at the University of Southern California in Los Angeles and the University of California at Berkeley Graduate School of Journalism that provides fellowships and multimedia training resources for aspiring New Media journalists.

Rich Gordon’s Online Community Cookbook

In the past year or so, the newspaper industry has devoted considerable attention to online communities. Newspapers have launched blogs, opened up discussion via article comments, built new online communities themselves (for instance, dozens of “moms” sites) and begun to experiment with the new world of social network sites such as MySpace and Facebook. Medill’s Rich Gordon ties all of these developments together into a structured format in order to understand, build, and sustain online communities.

Center for Social Media’s Guide to Fair Use in Online Video

This guide by the Center for Social Media at American University’s School of Communication is a code of best practices that helps creators, online providers, copyright holders, and others interested in the making of online video interpret the copyright doctrine of fair use. Fair use is the right to use copyrighted material without permission or payment under some circumstances.

Journalism 2.0 PDF Downloads

Download PDF versions of Journalism 2.0: How to Survive and Thrive in various languages here.

IJNet’s 10 Steps to Citizen Journalism Online

The International Center for Journalists and IJNet.org created this interactive training module as a basic introduction to hyperlocal news sites and blogs. You will need the Adobe Flash player to view the module.

The New West FAQ for Online Community Journalism Entrepreneurs

Jonathan Weber, editor and founder of NewWest.net, created this FAQ for those interested in creating local online news sites. Weber covers why he started New West, its revenue models and expected profits, how to get content, what technology is available, who the competitors are and more.

Journalism 2.0: How to Survive and Thrive

A guide to help professional and amateur news producers understand and implement digital tools to enhance their reporting. Written by Mark Briggs, assistant managing editor for interactive news at The News Tribune in Tacoma, Washington.

Community News Sites

Our list of community news sites.

Things We Like

KCNN is constantly exploring citizen media sites for good ideas to share with you. Check them out. Suggest things we should look at.

Jump Start Your Reporting

Do you need to find an expert fast, research your U.S. Senator’s voting record, investigate a local nonprofit? Here are some databases that can provide some shortcuts.

Journalism Training Sites

Here are some web sites that offer even more journalism training. Check them out.


Launching a Nonprofit News Site

The number of nonprofit news ventures is increasing rapidly and you may be thinking about becoming a part of it. This guide will walk you through the process – including the hurdles and the requirements – whether you are seeking to establish a federally recognized 501(c)3 organization or a project within a university or college.

Outside-the-Box Community Engagement

Engaging readers is why your online news community exists. You can’t use the wisdom of the crowds if the crowd isn’t talking. Without fast and substantive engagement, you might as well publish a newspaper. So when you build it and they don’t come, what do you do, short of waiting?

Pulitzer Center’s Media on the Move

This learning module is filled with text and videos that will guide journalists from story idea, through the reporting and distribution process. This approach treats the issues covered as campaigns, not just stand-alone stories. That means wide collaborations, embracing new technologies and taking the journalism out to classrooms and universities to engage the next generation.

Making the Most of Metrics

Whether you’re running a small hyperlocal community Web site or a large regional citizen media site, you can use free or inexpensive tools to measure how many people are visiting your site and where they like to go most. With the right analytics tools, you can also get very specific details in addition to total traffic numbers. This knowledge will then empower you to improve your site, increase traffic and give accurate information to potential advertisers and sponsors.

Networked Journalism: What Works

Engaging Audiences: Measuring Interactions, Engagement and Conversions

The rise of social media tools in recent years has empowered online news startups to increase content distribution, market their sites and track users. But most say they cannot lasso data to track whether they are turning users into supporters who will help their sites survive.

Philadelphia Enterprise Reporting Awards

Learn how collaborative journalism projects that received grants from J-Lab turned out over 300 stories, blogs, podcasts, videos, databases and maps. View the press release and the report on the first year.

Rules of the Road: Navigating the New Ethics of Local Journalism

Download PDF

With journalism entrepreneurs launching local news startups at a rapid pace, the local news landscape is evolving – and so are the rules of the road guiding ethical decisions. Where a bright ethical line once separated a newsroom from its business operations, one person now often wears multiple hats, as editor, business manager and grants writer. Site publishers navigate new kinds of critical decisions daily. This guide examines a number of them. You can click to any topic in any order. Or, you can cruise through the Table of Contents. On every page you’ll find a box that says, “Share your story.” We invite you to weigh in with an ethical problem you faced – and your solution. Your participation will help inform a work in progress.

New Media Makers Toolkit

This learning module is filled with original reporting that will help you learn about the innovative community news initiatives that are cropping up around the United States – and securing grants from foundations that have not traditionally supported journalism. In the case studies and accompanying videos, you’ll meet citizen journalists and professional journalists who have launched news initiatives that either partner with or supplement their metro news outlets. A key part of this toolkit is a searchable database, where you can see the kinds of news ventures that foundations have supported since 2005.

Likes & Tweets: Leveraging Social Media for News Sites

If you’re like most journalists and media entrepreneurs, you use social media daily, but that doesn’t mean you’re doing all you could with it to engage with your community, listen and monitor the conversation, or use it to plan outreach campaigns around news events, real world meet-ups and breaking stories.

That’s where this guide comes in. It’s a roadmap for improving both your understanding of social media and your use of it. This learning module focuses on the principles of authenticity, transparency and crowd-sourced, real-time communication that make social media so strikingly different from traditional media. It will also give you hands-on tools, tips and tactics that can make your daily use of Facebook, Twitter and other resources much more effective.

Interviewing: A practical guide for citizen journalists

Interviews are integral to good journalism. They provide more than just additional voices; they provide facts, expertise, balance, depth and credibility. They also breathe life into information that might otherwise fall flat. Whether you already interview or are daunted by the prospect, learn what types of interviews you should go for and how they can improve your journalism. Figure out where to quote or paraphrase. Learn how to navigate the unique ethical pitfalls that confront citizen journalists. Module developed by Lynne Perri and Angie Chuang at American University’s School of Communication.

Independent Metro News Sites Database

As daily metro newspapers continue to lose ground, a new model is emerging: Independent metro news sites with paid staff members. Primarily online only ventures, these sites continue to gain traction and attract attention for coverage of their communities. This living database tracks the business side of these news operations, offering a glimpse at their funding sources, budget, staffing levels, and visitor traffic.

The Freebies List for Frugal Journalists

In the era of new media, it’s important for new skills to be learned to keep up with growing audience demand. Editing audio and video for the Web is commonplace now, as is using the Internet for research and sharing. While there are plenty of good software programs out there to buy, comparable ones can be found all over the Internet for free or next-to-free. We have compiled a growing list of our favorites for anyone to use. Comment on the ones you find useful and let us know if you find any more out there.

The Citizen Journalist’s Guide to Open Government

This extensive, multimedia e-learning module helps new media makers understand how to obtain public records and get into public meetings. The guide features a unique, interactive map that tells citizens how they can locate open-government information on each of the 50 state Web sites. Produced by Geanne Rosenberg, founding chair of Baruch College’s new undergraduate Department of Journalism and the Writing Professions.

Twelve Tips for Optimizing Your Site for Search Engines

There’s good news for even solo citizen journalists who want to improve how their sites are found through search engines like Google: Your own homegrown search engine optimization can get you many of the benefits of a professional retooling. Search engine optimization, or SEO, just means making your site as easy to find and highly ranked as possible by search engines like Google, Yahoo, MSN and Ask.com. That way, people using those engines to look for relevant content can find what you have to offer. That’s increasingly important as more and more visitors find their way to sites like yours not by typing in your Web address, but by plugging a few choice words into their favorite search engine. Learn some easy ways to boost your ranking and get more traffic.

Twitter Tips: Today’s Must-Have Tool for Citizen Journalists

Twitter has finally hit its stride as a leading tool for finding and sharing timely information from all sorts of places and sources. Its usefulness for breaking news is obvious. However, Twitter is equally useful for tracking ongoing stories and issues, getting fast answers or feedback, finding sources, building community, collaborating on coverage, and discovering emerging issues or trends. Learn how to sign up, log on and start posting “tweets” to enhance your hyperlocal coverage.

Top 10 Rules for Limiting Legal Risk

If you’re running a citizen media site or contributing to one, these 10 rules will help you avoid potential legal piftalls. Get advice in videos from Harvard Berkman Center experts and Media Law Resource Center attorneys. Module produced by Geanne Rosenberg, associate professor at City University of New York’s Graduate School of Journalism and Baruch College.
Read the press release from CUNY.

Tools for Citizen Journalists

This six-chapter training module will help site operators and citizen journalists cope with the challenges of covering communities on small budgets with little or no staff. Get tips on where to sniff out great ideas and turn them into a compelling story, how to use data to punch up your coverage, how to manage a site when you don’t have a staff to help out, who to consider for partnerships that might help move your site along, and how to tap into the knowledge and passion of your readers. Module developed by Wendell Cochran and Amy Eisman, American University School of Communication.

Journalism 2.0: How to Survive and Thrive

A guide to help professional and amateur news producers understand and implement digital tools to enhance their reporting. Written by Mark Briggs, assistant managing editor for interactive news at The News Tribune in Tacoma, Washington.

Twelve Tips for Growing Positive Communities Online

Your site is up and all is running well until the conversation heats up and a flame war erupts. Here are a dozen ways to keep the discussion going while maintaining a civil environment and positive direction on your site.

Make Internet TV

Make Internet TV is an easy to read multimedia manual for publishing internet video. It has step-by-step instructions for everything from choosing a camera to publishing and promoting videos on the internet.

Principles of Citizen Journalism

Whether writing a blog or involved in a full-scale hyperlocal news site, you are going to face a higher degree of skepticism than traditional media. That means fairness, accuracy, transparency and independence are tantamount to success. See what citizen media veterans say about those topics and other foundations of citizen journalism.

Training Citizen Journalists

In these seven case studies from around the United States, get a birds-eye view of citizen journalism today.

Using E-mail to Jumpstart your Newsgathering

Even professional journalists, pressed by 24/7 deadlines, are finding a way to help jump-start their reporting on breaking news stories and find excellent examples to illustrate more ambitious enterprise stories.

http://openblockproject.org/

http://timetric.com/about/media-center/

http://www-958.ibm.com/software/data/cognos/manyeyes/

https://scraperwiki.com/

http://datahub.io/en/group/data-journalism

http://www.kobotoolbox.org/products/kobokit

http://opendatakit.org/participate/

low are the released and supported ODK projects.

  • Build - ODK Build enables users to generate forms using a drag-and-drop form designer. Build is implemented as an HTML5 web-based application and targets the common use case of a simple form.

  • Collect - ODK Collect is powerful phone-based replacement for your paper forms. Collect is built on the Android platform and can collect a variety of form data types: text, location, photos, video, audio, and barcodes.

  • Aggregate - ODK Aggregate provides a ready to deploy online repository to store, view and export collected data. Aggregate can run on Google’s reliable and free infrastructure as well as on local servers backed by MySQL and PostgreSQL.

  • Form Uploader - ODK Form Uploader easily upload a blank form and its media files to ODK Aggregate.

  • Briefcase - ODK Briefcase is the best way to transfer data from Collect and Aggregate.

  • Validate - ODK Validate ensures that you have a OpenRosa compliant form — one that will also work with all the ODK tools.

  • XLS2XForm - ODK XLS2XForm allow XForms to be designed with Excel.

http://datadrivenjournalism.net/resources/Meet_data_mapping_platform_CartoDB#When:18:19:19Z

fusion table vs cartodb and species sphere uses d3 and cartodb api

Tutorials for the implementation and use of freeDive can be found on the KDMC webpage and source code is available on GitHub. Disappointingly some of the links on the project website lead to blank pages, as is the case with the FAQ page, but such aspects will hopefully be improved in later stages of the project.

http://data.worldbank.org/developers

Syllabus

Posted on September 13, 2012

Aims of the course

The aim of the course is to familiarise students with current areas of research and development within computer science that have a direct relevance to the field of journalism, so that they are capable of participating in the design of future public information systems.

The course is built around a “design” frame that examines technology from the point of view of its possible applications and social context. It will familiarize the students with both the major unsolved problems of internet-era journalism, and the major areas of research within computer science that are being brought to bear on these problems. The scope is wide enough to include both relatively traditional journalistic work, such as computer-assisted investigative reporting, and the broader information systems that we all use every day to inform ourselves, such as search engines. The course will provide students with a thorough understanding of how particular fields of computational research relate to products being developed for journalism, and provoke ideas for their own research and projects.

Research-level computer science material will be discussed in class, but the emphasis will be on understanding the capabilities and limitations of this technology. Students with a CS background will have opportunity for algorithmic exploration and innovation, however the primary goal of the course is thoughtful, application-centered research and design.

Assignments will be completed in groups and involve experimentation with fundamental computational techniques. There will be some light coding, but the emphasis will be on thoughtful and critical analysis.

Format of the class, grading and assignments.

It is a fourteen week course for Masters’ students which has both a six point and a three point version. The six point version is designed for dual degree candidates in the journalism and computer science concentration, while the three point version is designed for those cross listing from other concentrations and schools.

The class is conducted in a seminar format. Assigned readings and computational techniques will form the basis of class discussion. Throughout the semester we will be inviting guest speakers with expertise in the relevant areas to talk about their related research and product development

The output of the course for a 6pt candidate will be one research assignment in the form of a 25-page research paper. The three point course will require a shorter research paper, and both versions of the course will also have approximately bi-weekly written assignmenst which will frequently involve experimentation with computational techniques. For those in the dual degree program or who have strong technical skills, there is an option to produce a prototype as part of the final assignment. The class is conducted on pass/fail basis for grading, in line with the journalism school’s grading system.

Week 1. – Basics
We set out the expectations of the course, and frame our work as the task of designing of public information production and distribution systems. Computer science techniques can help in four different areas: data-driven reporting, story presentation, information filtering, and effect tracking. The recommended readings are aiming to to give you an understanding of the landscape of technical disruption in the news industry, and the ways in which computer science techniques can help to build something better.

Required

Recommended

Viewed in class

Weeks 2-3: Technical fundamentals
We’ll spend the next couple weeks examining the techniques that will form the basis of much of the rest of our work in the course: clustering and the document vector space model.

Week 2: Clustering
A vector of numbers is a fundamental data representation which forms the basis of very many algorithms in data mining, language processing, machine learning, and visualization. This week we will explore two things: representing objects as vectors, and clustering them, which might be the most basic thing you can do with this sort of data. This requires a distance metric and a clustering algorithm — both of which involve editorial choices! In journalism we can use clusters to find groups of similar documents, analyze how politicians vote together, or automatically detect groups of crimes.

Required

Recommended

Viewed in class

Assignment: you must choose your groups of 2-3 students, and pick a data set to work with for the rest of the course. Due next week.

Week 3: Document topic modelling
The text processing algorithms we will discuss this week are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works.

Required

  • Online Natural Language Processing Course, Stanford University

    • Week 7: Information Retrieval, Term-Document Incidence Matrix

    • Week 7: Ranked Information Retrieval, Introducing Ranked Retrieval

    • Week 7: Ranked Information Retrieval, Term Frequency Weighting

    • Week 7: Ranked Information Retrieval, Inverse Document Frequency Weighting

    • Week 7: Ranked Information Retrieval, TF-IDF weighting

  • Probabilistic Topic Models, David M. Blei

Recommended:

Assignment – due in three weeks
You will perform document clustering with the
 gensim Python library, and analyze the results.

  1. Choose a document set. You can use the Reuters corpus if you like but you are encouraged to try other sources.

  2. Import the documents and score them in TF-IDF form. Then query the document set by retrieving the top ten closest documents (as ranked by cosine distance) for a variety different queries. Choose three different queries that show interesting strengths and weaknesses of this approach, and write analysis of the results.

  3. Choose a topic modelling method (such as connected components, LSA, or LDA) and cluster your documents. Hand in the extracted topics and comment on the results.

  4. Choose a clustering method (such as k-means) and cluster the documents based on the extracted topics. How do the resulting clusters compare to how a human might categorize the documents

Weeks 4-5: Filtering
Over the next few weeks we will explore various types of collaborative filters: social, algorithmic, hybrid classic correlation-based filtering algorithms (“users who bought X also bought Y”, Netflix Prize) location- and context-based filtering. Our study will include the technical fundamentals of clustering and recommendation algorithms.

Week 4: Information overload and algorithmic filtering
This week we begin our study of filtering with some basic ideas about its role in journalism. Then we shift gears to pure algorithmic approaches to filtering, with a  look at how the Newsblaster system works (similar to Google News.)

Required

Recommended

Week 5: Social software and social filtering
We have now studied purely algorithmic modes of filtering, and this week we will bring in the social. First we’ll look at the entire concept of “social software,” which is a new interdisciplinary field with its own dynamics. We’ll use the metaphor of “architecture,” suggested by Joel Spolsky, to think about how software influences behaviour. Then we’ll study social media and its role in journalism, including its role in information distribution and collection, and emerging techniques to help find sources.

Required

Recommended

Week 6: Hybrid filters, recommendation, and conversation
We have now studied purely algorithmic and mostly social modes of filtering. This week we’re going to study systems that combine software and people. We’ll a look “recommendation” systems and the socially-driven algorithms behind them. Then we’ll turn to online discussions, and hybrid techniques for ensuring a “good conversation” — a social outcome with no single definition. We’ll finish by looking at an example of using human preferences to drive machine learning algorithms: Google Web search.

Required

Recommended

Assignment – due in two weeks:
Design a filtering algorithm for Facebook status updates. The filtering function will be of the form(status update, user data) => boolean. That is, given all previously collected user data and a new status update from a friend, you must decide whether or not to show the new update in the user’s news feed. Turn in a design document with the following items:

  1. List all available information that Facebook has about you. Include a description of how this information is collected or changes over time.

  2. Argue for the factors that you would like to influence the filtering, both in terms of properties that are desirable to the user and properties that are desirable socially. Specify as concretely as possible how each of these (probably conflicting) goals might be implemented in code.

  3. Write psuedo-code for the filter function. It does not need to be executable and may omit details, however it must be specific enough that a competent programmer can turn it into working code in an obvious way.

Weeks 7-9: Knowledge mining

Week 7: Visualization
An introduction into how visualisation helps people interpret  information. The difference between infographics and visualization, and between exploration and presentation. Design principles from user experience considerations, graphic design, and the study of the human visual system. Also, what is specific about visualization in journalism, as opposed to the many other fields that use it?

Required

Recommended

Week 8: Structured journalism and knowledge representation
Is journalism in the text/video/audio business, or is it in the knowledge business? This week we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at th Reverb algorithm.

Required

Recommended

Assignment: Use Reverb to extract propositions from a subset of your data set (if applicable, otherwise the Reuters corpus). Analyze the results. What types of propositions are extracted? What types of propositions are not? Does it depend on the wording of the original text? What mistakes does Reverb make? What is the error rate? Are there different error rates for different types of statements, sources, or other categories?

Week 9: Network analysis
add intelligence examples?
Network analysis (aka social network analysis, link analysis) is a promising and popular technique for uncovering relationships between diverse individuals and organizations. It is widely used in intelligence and law enforcement, but not so much in journalism. We’ll look at basic techniques and algorithms and try to understand the promise — and the many practical problems.

Required

Recommended

Examples of journalistic network analysis

Week 10: Drawing conclusions from data
You’ve loaded up all the data. You’ve run the algorithms. You’ve completed your analysis. But how do you know that you are right? It’s incredibly easy to fool yourself, but fortunately, there is a long history of fields grappling with the problem of determining truth in the face of uncertainty, from statistics to intelligence analysis.

Required

Recommended

Week 11: Security, Surveillance, and Censorship
intro to crypto?
‘On the internet everyone knows you are a dog’. Both in commercial and editorial terms the issues of online privacy, identity and surveillance and important for journalism. Who is watching our online works? How do you protect a source in the 21st Century? Who gets to access to all of this mass intelligence, and what does the ability to survey everything all the time mean both practically and ethically for journalism?

Required

Recommended

Cryptographic security

Anonymity

Assignment: Come up with situation in which a source and a journalist need to collaborate to keep a secret. Describe in detail:

  1. The threat model. What are the risks?

  2. The adversary model. Who must the information be kept secret from? What are their capabilities, interests, and costs?

  3. A plan to keep the information secure, including tools and practices

  4. An evaluation of the costs, possible sources of failure, and remaining risks

Week 12: Tracking flow and impact
How does information flow in the online ecosystem? What happens to a story after it’s published? How do items spread through social networks? We’re just beginning to be able to track ideas as they move through the network, by combining techniques from social network analysis and bioinformatics.

Required

Recommended

Week 13 – Project review
We will spend this week discussing your final projects and figuring out the best approaches to your data and/or topic.

Week 14
Review of course. Invited guest panel of computer scientists working both within journalism and in related fields concerned with public information, discuss their priorities and answer questions about what their priorities.

Source:http://www.compjournalism.com/

http://www.votewatch.eu/

http://www.everyblock.com

http://www.djangobook.com/en/2.0/index.html

Bar, line and pie charts

Planning and management charts

Meters and gauges charts

Other types of charts

source: http://www.rgraph.net/examples/index.html

e broke the team’s strategy down in to a few key objectives, the four main ones being:

Provide context

Describe processes

Reveal patterns

Explain the geography

Here is some of what Ericson told the audience and some of the examples he gave during the session, broken down under the different headers.

Provide context

Graphics should bring something new to the story, not just repeat the information in the lede.

Ericson emphasised a graphics team that simply illustrates what the reporter has already told the audience is not doing its job properly. “A graphic can bring together a variety of stories and provide context,” he said, citing his team’s work on the Fukushima nuclear crisis.

We would have reporters with information about the health risks, and some who were working on radiation levels, and then population, and we can bring these things together with graphics and show the context.

Describe processes

The Fukushima nuclear crisis has spurned a lot of graphics work at news organisations across thew world, and Ericson showed a few different examples of work on the situation to the #ijf11 audience. Another graphic demonstrated the process of a nuclear meltdown, and what exactly was happening at the Fukushima plant.

As we approach stories, we are not interested in a graphic showing how a standard nuclear reactor works, we want to show what is particular to a situation and what will help a reader understand this particular new story.

Like saying: “You’ve been reading about these fuel rods all over the news, this is what they actually look like and how they work”.

From nuclear meltdown to dancing. A very different graphic under the ‘desribe processes’ umbrella neatly demonstrated that graphics work is not just for mapping and data.

Disecting a Dance broke down a signature piece by US choreographer Merce Cunningham in order to explain his style.

The NYT dance critic narrated the video, over which simple outlines were overlaid at stages to demonstrate what he was saying. See the full video at this link.

Reveal patterns

This is perhaps the objective most associated with data visualisation, taking a dataset and revealing the patterns that may tell us a story: crime is going up here, population density down there, immigration changing over time, etc.

Ericson showed some of the NYT’s work on voting and immigration patterns, but more interesting was a “narrative graphic” that charted the geothermal changes in the bedrock under California created by attempts to exploit energy in hot areas of rock, which can cause earthquakes.

These so-called narrative graphics are take what we think of as visualisation close to what we have been seeing for a while in broadcast news bulletins.

Explain geography

The final main objective was to show the audience the geographical element of stories.

Examples for this section included mapping the flooding of New Orleans following hurricane Katrina, including showing what parts of the region were below sea level and overlaying population density, showing where levies had broken and showing what parts of the land were underwater.

Geography was also a feature of demonstrating the size and position of the oil slick in the Gulf following the BP Deepwater Horizon accident, and comparing it with previous major oil spills.

Some of the tools in use by the NYT team, with examples:

Google Fusion Tables
Tableau Public: Power Hitters
Google Charts from New York State Test Scores – The New York Times
HTML, CSS and Javascript: 2010 World Cup Rankings
jQuery: The Write Less, Do More, JavaScript Library
jQuery UI – Home
Protovis
Raphaël—JavaScript Library
The R Project for Statistical Computing
Processing.org

An important formula 

Data + story > data

It doesn’t take a skilled mathematician to work that one out. But don’t be fooled by it’s simplicity, it underpinned a key message to take away from the workshop. The message is equally simple: graphics and data teams have the skill to make sense of data for their audience, and throwing a ton of data online without adding analysis and extracting a story is not the right way to go about it.
http://www.simile-widgets.org/exhibit3/

http://log.liminastudio.com/programming/soundcloudscraper-download-all-of-an-artists-soundcloud-tracks-automatically

http://jdownloader.org/download/index

http://www.documentcloud.org/home

http://panda.readthedocs.org/en/latest/index.html

http://overview.ap.org/

http://csvkit.readthedocs.org/en/latest/tutorial/getting_started.html

http://betterexplained.com/articles/a-visual-guide-to-version-control/

http://lifeandcode.tumblr.com/page/39

http://www.propublica.org/

http://hackshackers.com/

Resources

2240437216_7dbe91403d_z

Resources: Blog Roll

November 13th, 2012 | by EJC

A list of useful and informative blogs from media organisations, emergency management professionals and networks, web 2.0 and crisis mappers

Resources

3984515835_fe8740f476_z

Useful Links: Verification Tools

October 16th, 2012 | by EJC

A list of expert-recommended verification tools to assess the reliability of sources and the authenticity of user-generated content

Resources

Just_Landed

Useful Links: Official Data Sources

October 4th, 2012 | by EJC

A list of useful links to datasets from government agencies, international organisations and research centres.

Resources

Content_Curation_Stuck-in-Customs

Useful Links: Curation Platforms

September 4th, 2012 | by EJC

A list of useful curation platforms and curated news outlets

Resources

4866915444_24c4399b12_b

Useful Links: Mapping and Crowdsourcing Networks

August 30th, 2012 | by EJC

A list of useful mapping and crowdsourcing tools and networks that centre on emergency situation and humanitarian response.

Resources

GB.AFG.10.0095

Useful Links: Conflict Reporting

August 29th, 2012 | by EJC

A list of useful resources and organisations particularly focusing on conflict reporting.

Resources

4577949578_98d7829db8_z

Useful Links: Disaster Reporting

August 28th, 2012 | by EJC

A list of useful resources and organisations particularly focusing on disaster reporting.

http://emergencyjournalism.net/category/resources/

http://www.rhok.org/

http://kartograph.org/

Learn-to-Program Resources

Javascript

JQuery

Python

Ruby

PHP

Perl

Erlang

Processing

Tutorial Sites

  • NetTuts

  • Tutorialzine - Excellent video/text tutorials on a variety of web development topics.

  • The New Boston - Free educational video tutorials by Bucky Roberts. 

Free Online Computer Science Courses

  • Intro to Computer Science, Harvard - Videos of all lectures and notes are available for free online as part of the Open Courseware movement. Very engaging instructor. 

  • EdX is Harvard’s new online education portal, which now has many courses. Stanford Engineering Everywhere offers full video of lectures of many computer science classes, along with downloadable study materials. Free.

  • Intro to Computer Science, MIT - Full video and course materials via MIT Open Courseware.

  • Google Code University -  “This site provides sample course content and tutorials for Computer Science (CS) students and educators on current computing technologies and paradigms.”

  • Lecturefox