Important Tools for Visualising and Communicating Data
This list of resources represent an ongoing and growing series of blog posts presenting the most inspiring collection of important, effective, useful and practical data visualisation tools. You can also view these resources via a publicly accessible Google Spreadsheet.
** This series of posts will be undergoing a thorough update during January and February 2013! **
Part 1: Tools for Analysis, Graphing and Enterprise
Part 2: Visual Programming Languages and Environments
Part 3: Google’s Charting and Visualisation Tools
Part 4: Tools for Mapping
Part 5: Specialist Tools and Visualisation Communities
Part 6: Visualisation Presentation and Publishing Tools
The Most Influential Data Visualisation Books
Part 7: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (1)
Part 8: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (2)
Part 9: A Personal Collection of Influential Books on Data Visualisation and Other Related Subjects (3)
http://code.google.com/p/google-refine/,
http://www.google.com/fusiontables/Home, http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/json/, http://www.google.com/fusiontables/public/tour/index.html, http://www.google.com/support/fusiontables/bin/answer.py?hl=en&answer=184641, http://www.computerworld.com/s/article/9196283/H_1B_visa_data_Visual_and_interactive_tools, https://sites.google.com/site/fusiontablestalks/stories?ft_source=tour_defaulttab&__utma=1.1922599578.1299797723.1299797723.1299797723.1&__utmb=173272373.2.10.1299797450&__utmc=173272373&__utmx=-&__utmz=1.1299797723.1.1.utmcsr=%28direct%29|utmccn=%28direct%29|utmcmd=%28none%29&__utmv=-&__utmk=27954108
http://www.impure.com/ (http://www.youtube.com/watch?v=4oc47BB374U), http://www.youtube.com/watch?v=XYVAPfb8k5U, http://www.guardian.co.uk/news/datablog/interactive/2011/mar/08/pay-gap-gender-women-men,
http://vis.stanford.edu/wrangler/,
http://www.tableausoftware.com/public/blog/2011/04/data-shaping, http://www.tableausoftware.com/public/training, http://www.computerworld.com/s/article/9210078/Tech_unemployment_higher_than_white_collar_average#interactive_graph,
http://code.google.com/p/google-refine/wiki/Screencasts,
http://www-958.ibm.com/software/data/cognos/manyeyes/, http://www-958.ibm.com/software/data/cognos/manyeyes/page/Visualization_Options.html, http://www-958.ibm.com/software/data/cognos/manyeyes/page/Data_Format.html,
http://www.dataviz.org/, http://www.w3.org/TR/html4/present/frames.html, http://www.dataviz.org/,
http://simile-widgets.org/exhibit/, http://people.csail.mit.edu/karger/Exhibit/CAR/, http://simile-widgets.org/wiki/Getting_Started_with_Exhibit
http://www.filamentgroup.com/lab/update_to_jquery_visualize_accessible_charts_with_html5_from_designing_with/
http://sixrevisions.com/javascript/20-fresh-javascript-data-visualization-libraries/ (extract)
http://code.google.com/apis/charttools/index.html, http://code.google.com/apis/visualization/documentation/queries.html, http://code.google.com/apis/chart/docs/gallery/dynamic_icons.html, https://chart.googleapis.com/chart?chs=75×50&cht=gom&chd=t:70&chco=FF0000,FF8040,FFFF00,00FF00,00FFFF,0000FF,800080, http://code.google.com/apis/visualization/documentation/using_overview.html, http://code.google.com/apis/chart/docs/making_charts.html, http://code.google.com/apis/visualization/documentation/using_overview.html, http://code.google.com/apis/visualization/documentation/gallery.html
http://code.google.com/p/google-refine/wiki/DocumentationForUsers,
https://www.statwing.com/, http://blogs.computerworld.com/business-intelligenceanalytics/20909/startup-aims-simplify-data-analysis, The idea behind Statwing is to provide some basic, automated statistical analysis on data that users upload to the site — correlations, frequencies, visualizations and so on — without requiring you to know when, say, to use achi-squared distributionversus a z-test.,
statistical functions: http://office.microsoft.com/en-us/excel-help/statistical-functions-HP005203066.aspx
protovis: http://vis.stanford.edu/protovis/ex/
http://vis.stanford.edu/protovis/docs/start.html
sort tools by skill levels
http://www.qgis.org/
http://tbarmann.webfactional.com/nicar/qgis_tutorial/
Learn more: Timothy Barmann of The Providence Journal posted two very useful tutorials for the CAR conference that are still available: Introduction to QGIS and The Latest in Mapping With JavaScript and jQuery. Barmann also offers a sample: Rhode Island’s Ethnic Mosaic. Another resource to help you get started: QGIS Tutorial Labs from Richard E. Plant, professor emeritus at the University of California, Davis.
Note: If you’re interested in GIS and want to consider other free software options, download this PDF listing of Open Source/Non-Commercial GIS Products. And if you’re looking for a free open-source desktop GIS program that might be fairly easy to use, Jacob Fenton, director of computer-assisted reporting at American University’s Investigative Reporting Workshop, recommends taking a look at the System for Automated Geoscientific Analyses (SAGA) site. Finally, if analyzing geographic data in a conventional database sounds interesting, PostGIS ”spatially enables” the PostgreSQLrelational database, according to the site.
http://www.arl.org/sparc/openaccess/
most of these are outdated and people don’t even use, simplifying this to the best tools is the goal, any outdated or irrelevant, deprecated solutions will be deleted.
Google(which has a number of third-party front ends such as Map A List, an add-on that adds info to a Google Map from a spreadsheet). There’s also Yahoo Maps Web Services and Bing Maps – all with APIs. But there are numerous oth
OpenHeatMap
“How OpenHeatMap Can Help Journalists“
http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis?taxonomyId=18&pageNumber=8
OpenLayers
http://fuzzytolerance.info/code/openlayers-with-a-google-street-view-widget/
http://www.geoext.org/index.html
OpenLayers Simple Example. A good sample isUshahidi’s Haiti map.
There are other JavaScript libraries for overlaying information on maps, such as Polymaps. And there are a number of other mapping platforms, such as Google Maps, which offers numerous mapping APIs; Yahoo Maps Web Services, with its own APIs; the Bing Maps platform and APIs; andGeoCommons.
“Links and resources available below may be useful for those interested in pursuing open access publication or advocating for open access to others in the academic community, to grant-making institutions, or even to bodies of government. Resources supplied here include guides, presentation materials, and handbooks produced by SPARC and other organizations. These provide definitions and developments in the field, and point those interested to the growing success of Open Access. Please write to sparc[at]arl[dot]org with additions or corrections.”
http://www.worldbank.org/open/
https://openknowledge.worldbank.org/
http://www.transparency.org/
What is the Open Aid Partnership?
Transparency of development assistance, public budgets and service delivery is critical for citizen engagement. Innovative technologies, such as mapping, provide powerful new tools for strategic planning and for greater transparency and accountability. Recognizing the significant impact that these innovations and an empowered civil society can have on improving development effectiveness, the World Bank Institute and bilateral donor partners, foundations and civil society have formed an Open Aid Partnership. The Partnership will be working in close collaboration with the International Aid Transparency Initiative (IATI) and the Open Government Partnership (OGP). The partnership brings development partners together to enhance the openness and effectiveness of development assistance.
What are the Open Aid Partnership’s main objectives?
-
improve aid transparency and coordination by developing an Open Aid Map that visualizes the location of donor-financed programs at the local level;
-
better monitor the impact of development programs on citizens;
-
enhance the targeting of development programs;
-
foster accountability by empowering citizens to provide direct feedback on project results;
-
strengthen capacity of civil society and citizens to use open aid data.
http://www.openaidmap.org/
Putting Development on a Map (Mapping for Results)
The Partnership builds on the World Bank’s Mapping for ResultsInitiative, which has mapped 30,000 activities in all 143 of its client countries, and overlays these data with sub-national poverty and human development indicators at the local level. The initiative is based on the premise that the combination of visualization technologies and open data on development assistance can enable a more transparent, inclusive and effective development process.
Main Components of Open Aid Partnership:
-
Map activities supported by development assistance and create a web-based collaborative Open Aid Map that helps improve coordination, efficiency, transparency and accountability of development assistance.
-
Support developing countries in building national mapping platforms.
-
Promote citizen feedback initiatives for better reporting on development assistance and public service provision in order to enhance transparency and accountability.
-
Build capacity of civil society to act as information intermediaries for citizens and make these maps more accessible, as well as the capacity of public service providers to receive and respond to feedback.
-
Evaluate the development impact of national mapping platforms and feedback initiatives on public services and related capacity building.
http://www.open-contracting.org/
he information available through the AidData database serves as a platform for testing new ways to make aid information more relevant for different audiences. For example, recent work on geocoding aidcan help civil society organizations identify the aid-funded activities that are underway in their communities. AidData’s work supports the efforts of the International Aid Transparency Initiative(IATI) by allowing users to download data in IATI format. Additionally, AidData Rawserves as a repository for datasets that have not yet been vetted or that are not appropriate for inclusion in the main AidData database but provide added informational value.
http://www.aiddata.org/content/index/Services/geocoding
1) Collect, standardize, and organize geo-enabled data. Teams of experienced researchers are available to geocode project data so that it can be used for maps and other visualizations. AidData has partnered with the World Bank Institute, through the Mapping for Results initiative, and works with theAfrican Development Bank, the Kellogg Foundation, and the Malawi Ministry of Financeto geocode information.
2) Prepare visualizations and analytics that leverage the power of geo-enabled data. Once an organization has geo-enabled data, we work in partnership with Esrito visualize this information on state-of-the-art interactive maps. Custom dashboards that combine maps with graphs and charts support monitoring and evaluation efforts, and help analysts and decision-makers identify risks and define next steps.
3) Prepare implementation reports with recommendations. AidData works with organizations to determine challenges and opportunities to geo-enable their data collection and dissemination efforts, and prepare reports with actionable recommendations. Reports can include roadmaps for compliance with international data standards such as the aid information reporting standard developed by the International Aid Transparency Initiative(IATI).
4) Build and implement custom IT solutions. Based on these reports and recommendations,Development Gatewaycan help design, integrate, and implement custom geo-enabled modules and applications that extend current client systems/processes to create sustainability. Mobile applications can be developed to enable real-time data collection and increase accessibility to broader user groups. Seamlessly integrated web and mobile applications offer organizations a comprehensive way to make their work more efficient and effective.
http://openlayers.org/QuickTutorial/ [open streetmap]
TimeFlow
https://github.com/FlowingMedia/TimeFlow/wiki/Top-Tips
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=AW-0VW
http://www.wordle.net/
http://www.simile-widgets.org/timeline/
http://nodexl.codeplex.com/
http://www.analytictech.com/ucinet/
d free NodeXL tutorial (PDF) or these basic step-by-step instructions on analyzing your own Facebook social network(PDF). One Facebook app for downloading your own friend information for use in NodeXL is Name Gen Web.
gephi tutorial: http://gephi.org/tutorials/gephi-tutorial-quick_start.pdf
http://thejit.org/demos/, http://thejit.org/
http://www.spatstat.org/spatstat/,
http://www.peteraldhous.com/CAR/Aldhous_CAR2011_RforStats.pdf,
http://jacobfenton.s3.amazonaws.com/R-handson.pdf,
http://cran.r-project.org/doc/manuals/R-intro.html,
http://www.r-statistics.com/tag/visualization/,
http://csvkit.readthedocs.org/
Source: http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visualization_and_analysis?taxonomyId=18&pageNumber=2
Data visualization and analysis tools
|
Tool |
Category |
Multi-purpose |
Mapping |
Platform |
Skill |
Data stored |
Designed for |
|
Data cleaning |
No |
No |
Browser |
2 |
External server |
No |
|
|
Data cleaning |
No |
No |
Browser |
2 |
Local |
No |
|
|
Statistical analysis |
Yes |
With plugin |
Linux, Mac OS X, Unix, Windows XP or later |
4 |
Local |
No |
|
|
Visualization app/service |
Yes |
Yes |
Browser |
1 |
External server |
Yes |
|
|
Visualization app/service |
Yes |
No |
Browser |
3 |
Varies |
Yes |
|
|
Visualization app/service |
Yes |
Limited |
Browser |
1 |
Public external server |
Yes |
|
|
Visualization app/service |
Yes |
Yes |
Windows |
3 |
Public external server |
Yes |
|
|
Visualization app/service |
Yes |
Yes |
Browser |
1 |
External server |
Yes |
|
|
Visualization app/service |
Yes |
No |
Browser |
2 |
External server |
Yes |
|
|
Framework |
Yes |
Yes |
Chrome, Firefox, Safari |
4 |
Local or external server |
Not yet |
|
|
Library |
Yes |
Yes |
Code editor and browser |
4 |
Local or external server |
Yes |
|
|
Library and Visualization app/service |
Yes |
Yes |
Code editor and browser |
2 |
Local or external server |
Yes |
|
|
Library |
Yes |
No |
Code editor and browser |
4 |
Local or external server |
Yes |
|
|
Library |
Yes |
Yes |
Code editor and browser |
4 |
Local or external server |
Yes |
|
|
GIS/mapping: Desktop |
No |
Yes |
Linux, Unix, Mac OS X, Windows |
4 |
Local |
With plugin |
|
|
GIS/mapping: Web |
No |
Yes |
Browser |
1 |
External server |
Yes |
|
|
GIS/mapping: Web, Library |
No |
Yes |
Code editor and browser |
4 |
local or external server |
Yes |
|
|
GIS/mapping: Web |
No |
Yes |
Browser or desktops running Java |
3 |
Local or external server |
Yes |
|
|
Temporal data analysis |
No |
No |
Desktops running Java |
1 |
Local |
No |
|
|
Word clouds |
No |
No |
Desktops running Java |
2 |
Local |
As image |
|
|
Network analysis |
No |
No |
Desktops running Java |
4 |
Local |
As image |
|
|
Network analysis |
No |
No |
Excel 2007 and 2010 on Windows |
4 |
Local |
As image |
|
|
CSV file analysis |
No |
No |
Linux, Mac OS X or Linux with Python installed |
3 |
Local |
No |
|
|
Create sortable, searchable tables |
No |
No |
Code editor and browser |
3 |
Local or external server |
Yes |
|
|
Create sortable, searchable tables |
No |
No |
Browser |
2 |
External server |
Yes |
|
|
Library |
Yes |
No |
Code editor and browser |
3 |
Local or external server |
Yes |
|
|
Data reformatting |
No |
No |
Browser |
1 |
Local or external server |
No |
|
|
Create searchable tables |
No |
No |
Browser with Amazon EC2 or Ubuntu Linux |
2 |
Local or external server |
No |
|
|
Analysis and charting |
Yes |
No |
Excel 2010 on Windows |
3 |
Local |
No |
|
|
Visualization app/service |
Yes |
Yes |
Flash-enabled browsers; Linux server on backend |
4 |
Local or external server |
Yes |
|
|
Visualization app/service |
Yes |
No |
Browser |
1 |
External server |
Not yet |
|
|
Visualization app/service |
Yes |
Limited |
Browser |
1 |
External server |
Yes |
|
|
Visualization app/service |
Yes |
No |
Browser |
1 |
Local or external server |
Yes |
*Highcharts is free for non-commercial use and $80 for most single-site-wide licenses.
|
*
|
This paper outlines proposals for meeting
the objectives of the International Aid
Transparency Initiative (IATI) without
disproportionate cost, and explains what
value IATI would add to existing systems
for reporting aid. Detailed work on
implementation issues is scheduled
through the IATI Technical Advisory Group
(TAG) during 2010. Membership of the
TAG is open, and so far, over 100
individuals have contributed to its work,
including representatives of each
stakeholder group.
There are many people and organisations
with diverse, legitimate and important
needs for information about aid.
Developing country governments need
information about how aid is being spent
in their country. Parliamentarians in
developing countries and in donor
countries want to hold their government to
account. Communities in developing
countries need to know what resources
are available for their development
priorities and in what way they can
influence how those resources are used. A
village council wants to know what aid is
available to improve water in its area.
Researchers need better data to
understand how aid can be more effective.
Taxpayers want to know how their money
is being spent.
No single database can satisfactorily
meet the needs of all these potential
users.
These users all want information tailored
to their own needs. Often they want
information from many different donors,
combined with information from other
sources, such as the government’s
spending, or disease surveillance data.
Yet it is unrealistic to expect donors to
provide information separately to
hundreds of possible information systems.
This then is the dilemma: users need
information presented in ways specific
to their needs, but donors cannot
provide information to each of them
individually.
There are broadly two ways to respond to
this challenge. A limited response is for
those donors who currently report to the
Development Assistance Committee
(DAC) databases to step up
the information that they already provide,
and for all donors to improve reporting to
country government aid management
systems (AIMS).
This paper sets out a more
comprehensive response and shows how
IATI could improve reporting to existing
systems, and at the same time meet a
much wider range of needs for
information, including documents as well
as data.
Donors would extend their existing
processes for collecting information about
aid, which they currently use to report to
the DAC and other systems. They would
include additional information needed by
other stakeholders, much of which is
currently collected and provided
separately. As now, donors would choose
their own systems to manage this data
collection. They would put this combined
information into the public domain more
rapidly and in a common format. They
would register the location of the data in a
“registry” – a kind of online catalogue
which enables users to find it.
This approach can be summarised as
“publish once, use often”.
The combination of common, open
formats plus the registry would add huge
value to the information already being
published by donors, and the additional
information they would publish as a result
of IATI, because users would be able to
access information of particular interest to
them, in a format that is useful to them,
without having to trawl round all the donor
websites individually. This would open up
the information to a wider range of users
and democratise access to information
through services such as mobile phones
or Google.
The information collected and published
under IATI would provide the information
needed for donor reporting to existing
systems, such as DAC and country AIMS
and national budgets. This would reduce
duplicate information collection and
reporting.
To meet their commitments under the
Accra Agenda for Action (AAA), and in the
context of growing calls for government
transparency, donors are increasingly
publishing more information about aid.
Clearly this will involve some costs to
donors. These IATI proposals are
designed to minimise the additional
burden of this greater transparency, and
yet obtain the maximum benefits from their
efforts by ensuring that the information,
once collected, is universally accessible.
Based on extensive stakeholder consultation summarised in Chapter One, aindinfo concludes that the
system to implement the IATI declaration signed in Accra in 2008 should:
1. meet in full the information needs of developing country government AIMS and budgets without
imposing a burden on developing countries, including complying with local definitions and
classifications;
2. build on the work that has been done through the DAC to develop common definitions and reporting
processes, and avoid the establishment of duplicate or parallel reporting processes;
3. produce information which is easily accessible to parliamentarians, civil society, the media and
citizens as well as to governments (in line with the expanded definition of country ownership agreed
at Accra);
4. provide accurate, high quality and meaningful information, and enable users to distinguish official
statistics, which have been professionally scrutinised, from management information about projects
and programmes;
5. include information about spending by non-DAC donors, multilateral organisations, foundations and
NGOs;
6. be easy to understand, reconcile, compare, add up, read alongside other sources of information,
and be easy to organise and present in ways that are useful to information users;
7. be legally open, with as few barriers to access and reuse as possible;
8. reduce duplicate reporting by donor agencies and minimise additional costs;
9. be electronically accessible in an open format so a wide range of third party intermediaries can
access the information and present it either as comprehensive information or niche analysis;
10. result in access to information about aid which is more timely, more detailed, more forward looking
and more comprehensive than existing data, and which includes wider information on aid, such as
key policy and appraisal documents and the outputs and outcomes it achieves
The International Aid Transparency
Initiative (IATI) was launched at the Accra
High Level Forum on Aid Effectiveness in
September 2008. IATI is a multistakeholder initiative to accelerate access
to aid information to increase
effectiveness of aid in reducing poverty.
The Accra Agenda for Action (AAA)
recognised that increased transparency is
central to the objectives of the Paris
Declaration. Transparency is essential to
meet the five underlying principles of
ownership, alignment, harmonisation,
managing for results, and mutual
accountability. The AAA expanded the
concept of country ownership to include
parliamentarians, civil society
organisations (CSOs), academics, the
media and citizens. Donors agreed to
support efforts to increase the capacity of
all development actors to play an active
role in policy dialogue. The AAA
committed donors to “disclose regular,
detailed and timely information about our
aid flows” and to “support information
systems for managing aid”.
IATI provides a way for donors to meet
this commitment in a coherent and
consistent way. IATI has 18 signatories, of
whom 13 are DAC members. These
signatories resolved to “give strong
political direction” and “invest the
necessary resources in accelerating the
availability of aid information”.
IATI also contributes to Cluster C on
Transparent and Responsible Aid, which
sits under the Working Party on Aid
Effectiveness (WP-EFF.) IATI has been
tasked by the Cluster with developing
reporting formats and definitions for
sharing information about aid, drawing on
the expertise of the Working Party on
Statistics (WP-STAT.) Proposals
developed by IATI will be available to
inform the Cluster’s work.
IATI aims to agree a four-part standard
consisting of:
(1) an agreement on what would be
published
(2) common definitions for sharing
information
(3) a common electronic data format
(4) a code of conduct.
The details of what would be covered by
IATI and how this would be published will
be decided by the IATI members, following
detailed research by the Technical
Advisory Group (TAG) and consultation
with stakeholders. It is intended that the
standard will be adopted at first by IATI
members but it may over time be adopted
by other DAC donors, and by other nonDAC donors, other foundations and nongovernmental organisations (NGOs).
There is widespread support among
developing country governments for
extending the coverage of aid information
to non-traditional donors.
IATI responds to growing demands from
civil society and citizens for greater
transparency of information about
spending and results, and for access to
key documents as well as data. The
ambitions of IATI are consistent with many
other recent initiatives to increase
transparency, for example President
Obama’s August 2009 memo on
transparency, the World Bank’s new
disclosure policy, which represents a
paradigm shift to proactive disclosure with
limited exceptions, and the development
of online information portals for citizens,
such as in Brazil. IATI seeks to harness
the power of new technology to deliver
real improvements in the lives of the
world’s poorest people, in the same way
that email, internet access and mobile
phone networks have revolutionised the
way that aid agencies themselves do their
business.
Since its launch in September 2008, IATI
has focused on consultation with
developing countries and CSOs, factfinding missions to a number of donor
countries, and detailed work by the TAG
on parts 1 and 4 of the proposed IATI
standard, covering an agreement on what
would be published and a code of
conduct.
The IATI Conference, held in The Hague
in October 2009, confirmed widespread
support for the objectives of IATI, and
consensus on the key information needs
of different stakeholders. At the same
time, it was clear during the IATI
conference that a number of stakeholders
would welcome greater clarity on how IATI
might work in practice, so that they can
consider the full implications of the
initiative for their agencies.
Although detailed work on the precise
practical and technical mechanisms for
implementing IATI is only just beginning,
this paper presents a proposal on how
IATI would work, what this framework
would mean for different stakeholders, and
what added-value it is envisaged that IATI
would offer as a result.
Notes
1. http://www.whitehouse.gov/the_press_
office/TransparencyandOpenGovernm
ent/
The task that needs to be done would be gathering the correct people, agencies, non for profits, and businesses around their area of expertise to execute. From there would be so much so much data to report, analyze and gain insight from that we’ll be busy for a while before we come to general consensus on how people can contribute to the issues in the backlog of their municipal. This is a proposal to change the 2025 vision.
http://selection.datavisualization.ch/
Wolfram|Alpha Pro
[http://tributary.io/] for D3 prototyping and RStudio Server for R (which is amazing if you haven’t tried it)

what matters most is the centralization of this data in an easily scrapable and visualizable format
Geodata- the data that is used to make maps- from location of roads and buildings to topography and boundaries
Culture: data about cultural works and artefacts for example titles and authors- and generally collected and held by galleries, libraries, archives, and musuems.
Science- Data that is produced as part of scientific researcher from astronomy to zoology.
Financial- data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds, etc.)
Statistics- data produced by statistical offices such as the census and key socioeconomic indicators
Weather- the many types of information used to understand and predict the weather and climate.
Environment- information related to the natural environment such presence and level of pollutants, the quality and rivers and seas.
Transport- data such as timetables, routes, on-time statistics. (public bus statistics)
![]()
Transparency. In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized and this requires that the material be open so that it can be freely used and reused.
![]()
Releasing social and commercial value. In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.
![]()
Participation and engagement – participatory governance or for business and organizations engaging with your users and audience. Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it.
DDaattaa AAPPIIss
We provide a number of public APIs that expose the data in our services to developers who want
to re-use it.
okfn / ckan
The Comprehensive Knowledge Archive Network (CKAN) stores metadata and data for datasets.
Various deployments exist, but an API is available for all of them.
Documentation
Python Client
JavaScript Client
Endpoints: demo.ckan.org, datahub.io, data.gov.uk, more…
ooppeennssppeennddiinngg / ooppeennssppeennddiinngg
OpenSpending.org is a datamart for government financial data. It stores budgets and
transactional expenditure and offers search, export and aggregation APIs.
Documentation
JavaScript Client
okfn / bbiibbsseerrvveerr
BibServer is a tool for quickly and easily sharing collections of bibliographic metadata. Most of
the data stored internally can be read through the API.
Documentation
BibSoup.net instance
ppyybboossssaa / ppyybboossssaa
PyBossa is a crowd-sourcing platform where users can help to complete tasks, such as image
analysis or text transcription. The application can be completely controlled via it’s API.1/14/13 Data Sources
okf nlabs.org/data/ 2/2
JavaScript Client
Python Client
Documentation
okfn / aaccttiivviittyyaappii
The activity API collects data about project-related activties on various collaborative platforms,
including GitHub, Twitter and Mailing List feeds.
Endpoint documentation
Frontend source code
pudo / nnoommeennkkllaattuurraa
Nomenklatura is a very simplistic data linking tool. It maintains authoritative lists of names (e.g.
politicians, companies, streets) and offers an API and web-based interactive recon tool to match
variant spellings of these names to the canonical form.
Python client
Endpoint documentation
http://opendatahandbook.org/en/
1/14/13 The Open Data Handbook — Open Data Handbook
opendatahandbook.org/en/ 1/3
Open Data Handbook
The Open Data Handbook
This handbook discusses the legal, social and technical aspects of open data. It
can be used by anyone but is especially designed for those seeking to open up data. It
discusses the why, what and how of open data – why to go open, what open is, and the
how to ‘open’ data.
To get started, you may wish to look at the Introduction. You can navigate through the
report using the Table of Contents (see sidebar or below).
We warmly welcome comments on the text and will incorporate feedback as we go forward.
We also welcome contributions or suggestions for additional sections and areas to examine.
Table of Contents
Introduction
Target Audience
Credits
Credits and Copyright
Why Open Data?
What is Open Data?
What is Open?
What Data are You Talking About?
How to Open up Data
Choose Dataset(s)
Asking the community
Cost basis
Ease of release
Observe peers
Apply an Open License (Legal Openness)
Make Data Available (Technical Openness)
Online methods
Search1/14/13 The Open Data Handbook — Open Data Handbook
opendatahandbook.org/en/ 2/3
Make data discoverable
Existing tools
For government
So I’ve Opened Up Some Data, Now What?
Tell the world!
Understanding your audience
Post your material on third-party sites
Making your communications more social-media friendly
Social media
Getting folks in a room: Unconferences, Meetups and Barcamps
Making things! Hackdays, prizes and prototypes
Examples for Competitions
Conferences, Barcamps, Hackdays
Glossary
Appendices
File Formats
An Overview of File Formats
Open File Formats
How do I use a given format?
What Legal (IP) Rights Are There in Data(bases)
Indices and tables¶
Index
Search Page
An OOppeenn KKnnoowwlleeddggee FFoouunnddaattiioonn pprroojjeecctt..
©© 22001100–22001122,, OOppeenn KKnnoowwlleeddggee FFoouunnddaattiioonn. LLiicceennsseedd uunnddeerr CCrreeaattiivvee CCoommmmoonnss AAttttrriibbuuttiioonn ((UUnnppoorrtteedd)) vv33..00
LLiicceennssee
SSoouurrccee —— IIssssuueess —— MMaaiilliinngg LLiisstt —— TTwwiitttteerr @@OOKKFFNN1/14/13 The Open Data Handbook — Open Data Handbook
opendatahandbook.org/en/ 3/3
RReellaatteedd PPrroojjeeccttss:: OOppeennGGoovveerrnnmmeennttDDaattaa..oorrgg —— TThheeDDaattaaHHuubb..oorrgg —— DDaattaaCCaattaallooggss..oorrgg —— OOppeennSSppeennddiinngg..oorrgg ——
DdaattaaPPaatttteerrnnss..oorrgg
http://www.isitopendata.org/about/
with open data the rise of the citizen scientist, journalist, etc will help us achieve common goals faster and accomplish tasks that would be too expensive or time consuming to accomplish through other means
http://www.citizencyberscience.net/
Get this info as JSON, XML, RDF
Data Sources
-
Transparency International’s Corruption Perception Index
-
Annual Reports from individual companies
Tools & Resources
-
Excel
-
Evan Raskob’s Intro to Programming
-
Data Expeditions Character Sheet
Find out more about working with open data by exploring these resources:
Inclusive Planning Outreach with Web-based Tools
PlanningPress is a web toolkit for inclusive, responsive, authentic citizen engagement in transportation planning.
The web has opened up new modes of communication between governments and the public, introduced new possibilities for collaborative work, and made dynamic data visualization and analysis possible. PlanningPress makes it straightforward to apply these opportunities to community transportation planning. Everyone involved can review and engage in dialog on ideas and proposals, using maps and a user-friendly interface.
Intended for use by transportation departments and agencies, PlanningPress complements and extends the reach of an existing planning process. It enables regular, non-technical team members to publish updates. The simple content management system is built on WordPress, a widely-used publishing platform.
NYCDOT’s Jackson Heights portal is powered by PlanningPress. The website introduces the changes proposed for the neighborhood and shows them in detail allowing residents to comment on the plans. It lays out a timeline for events concerning the project, has an interactive map, news updates and other resources to help people understand the project as it develops.
Open Source
Stamen is an active contributor to and author of multiple open source projects. These collaborative efforts often play a valuable role in our commercial work, and lessons learned from working for clients have a way of making their way into code releases that the public at large can benefit from.
Aight
A collection of tools for making reasonable JavaScript and CSS work in IE8.
Polymaps
Polymaps provides speedy display of multi-zoom datasets over maps, and supports a variety of visual presentations for tiled vector data, in addition to the usual cartography from OpenStreetMap, CloudMade, Bing, and other providers of image-based web maps.
Because Polymaps can load data at a full range of scales, it’s ideal for showing information from country level on down to states, cities, neighborhoods, and individual streets. Because Polymaps uses SVG (Scalable Vector Graphics) to display information, you can use familiar, comfortable CSS rules to define the design of your data. And because Polymaps uses the well known spherical mercator tile format for its imagery and its data, publishing information is a snap.
CityTracking
Dotspotting is the first project Stamen is releasing as part of Citytracking, a project funded by the Knight News Challenge.
We’re making tools to help people gather data about cities and make that data more legible. The code for Dotspotting is available fordownload on Github, and licensed for used under the GNU General Public License.
Modest Maps
Modest Maps is a BSD-licensed display and interaction library for tile-based maps in Flash (ActionScript 2.0 and ActionScript 3.0) and Python.
Our intent is to provide a minimal, extensible, customizable, and free display library for discriminating designers and developers who want to use interactive maps in their own projects. Modest Maps provides a core set of features in a tight, clean package, with plenty of hooks for additional functionality.
Cascadenik
Cascadenik implements cascading stylesheets for Mapnik, a Free Toolkit for developing mapping applications.
It’s an abstraction layer and preprocessor that converts special, CSS-like syntax into Mapnik-compatible style definitions. It’s easier to write complex style rules using the alternative syntax, because it allows for separation of symbolizers and provides a mechanism for inheritance.
Tile Drawer
Tile Drawer makes designing and hosting custom maps simple and straightforward. The project lets anyone run their ownOpenStreetMap server in the cloud with one-step configuration and zero administration. You can use the rendered map tiles in a number of ways: with other GIS data in OpenLayers, in a Flash application built on Modest Maps, or layered into a Google Map as a custom map tile overlay.
Walking Papers
OpenStreetMap, the wiki-style map of the world that anyone can edit, is in need of a new way to add content. Walking Papers is a way to “round trip” map data through paper, to make it easier to perform the kinds of eyes-on-the-street edits that OSM needs now the most, as well as distributing the load by making it possible for legible, easy notes to be shared and turned into real geographical data.
TileStache
TileStache is a Python-based server application that can serve up map tiles based on rendered geographic data.
You might be familiar with TileCache, the venerable open source WMS server from MetaCarta. TileStache is similar, but we hope simpler and better-suited to the needs of designers and cartographers.
http://postgis.refractions.net/
https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdFZPM25HN2pHUk1XSXl0RFg5YkFId0E#gid=0
http://datawrapper.de/docs/tutorial
Computerworld - Reporters wrangle all sorts of data, from analyzing property tax valuations to mapping fatal accidents — and, here at Computerworld, for stories about IT salaries and H-1B visas. In fact, tools used by data-crunching journalists are generally useful for a wide range of other, non-journalistic tasks — and that includes software that’s been specifically designed for newsroom use. And, given the generally thrifty culture of your average newsroom, these tools often have the added appeal of little or no cost.
I came back from last year’s National Institute for Computer-Assisted Reporting (NICAR) conference with 22 free tools for data visualization and analysis – most of which are still popular and worth a look. At this year’s conference, I learned about other free (or at least inexpensive) tools for data analysis and presentation.
Want to see all the tools from last year and 2012?
For quick reference, check out our chart listing all 30 free data visualization and analysis tools.
Like that previous group of 22 tools, these range from easy enough for a beginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiring hands-on coding). Here are eight of the best:
CSVKit
What it does: This utility suite available from Christopher Groskopf’s GitHub account has a host of Unix-like command-line tools for importing, analyzing and reformatting comma-separated data files.
8 tools for data analysis
What’s cool: Sure, you could pull your file into Excel to examine it, but CSVKit makes it quick and easy to preview, slice and summarize.
For example, you can see all your column headers in a list — which is handy for super-wide, many-column files — and then just pull data from a few of those columns. In addition to inputting CSV files, it can import several fixed-width file formats — for example, there are libraries available for the specific fixed-width formats used by the Census Bureau and Federal Elections Commission.
Two simple commands will generate a data structure that can, in turn, be used by several SQL database formats (Mr. Data Converter handles only MySQL). The SQL code will create a table, inferring the proper data type for each field as well as the insert commands for adding data to the table.
![]()
CSVKit offers Unix-like command-line tools for importing, analyzing and reformatting comma-separated data files.
The Unix-like interface will be familiar to anyone who has worked on a *nix system, and makes it easy to save multiple frequently used commands in a batch file.
Drawbacks: Working on a command line means learning new text commands (not to mention the likely risk of typing errors), which might not be worthwhile unless you work with CSV files fairly often. Also, be advised that this tool suite is written in Python, so Windows users will need that installed on their system as well.
Skill level: Expert
Runs on: Any Windows, Mac or Linux system with Python installed.
Learn more: The documentation includes an easy-to-follow tutorial. There’s also a brief introductory slide presentation that was given at the NICAR conference last month.
Related tools: Google Refine is a desktop application that can do some rudimentary file analysis as well as its core task of data cleaning; and The R Project for Statistical Computing can do more powerful statistical analysis on CSV and other files.
DataTables
What it does: This popular jQuery plug-in (which was designed and created by Allan Jardine) creates sortable, searchable HTML tables from a variety of data sources — say, an existing, static HTML table, a JavaScript array, JSON or server-side SQL.
Apple device sales
Search:
Show 102550100 entries
|
Quarter ending |
Unit sales (millions) |
Device |
|---|---|---|
|
2010-06 |
3.3 |
iPad |
|
2010-09 |
4.2 |
iPad |
|
2010-12 |
7.3 |
iPad |
|
2010-12 |
16.2 |
iPhone |
|
2010-12 |
4.1 |
Mac |
|
2011-03 |
4.7 |
iPad |
|
2011-03 |
18.6 |
iPhone |
|
2011-03 |
3.8 |
Mac |
|
2011-06 |
9.3 |
iPad |
|
2011-06 |
20.3 |
iPhone |
Showing 1 to 10 of 17 entries
PreviousNext
Source: Apple earnings statements
What’s cool: In addition to sortable tables, results can be searched in real time (results are narrowed further with each search-entry keystroke).
Drawbacks: Search capability is fairly basic and cannot be narrowed by column or by using wildcard or Boolean searches.
Skill level: Expert
Runs on: JavaScript-enabled Web browsers
Learn more: Numerous examples on the DataTables site show many ways to use this plug-in.
FreeDive
What it does: This alpha project from the Knight Digital Media Center at UC Berkeley turns a Google Docs spreadsheet into an interactive, sortable database that can be posted on the Web.
What’s cool: In addition to text searching, you can include numerical range-based sliders. Usage is free. End users can easily create their own databases from spreadsheets without writing code.
![]()
FreeDive turns a Google Docs spreadsheet into an interactive, sortable database
FreeDive’s chief current attraction is the ability to create databases without programming; however, freeDive source code will be posted and available for use once the project is more mature. That could appeal to IT departments seeking a way to offer this type of service in-house, allowing end users to turn a Google Doc into a filterable, sortable Web database using the Google Visualization API, Google Query Language, JavaScript and jQuery — without needing to manually generate that code.
Drawbacks: My test application ran into some intermittent problems; for example, it wouldn’t display my data list when using the “show all records” button. This is an alpha project, and should be treated as such.
8 tools for data analysis
In addition, the current iteration limits spreadsheets to 10 columns and a single sheet. One column must have numbers, so this won’t work for text-only information. The search widget is currently limited to a few specific choices of fields to search, although this might increase as the project matures. (A paid service like Caspio would offer more customization.) The nine-step wizard might get cumbersome after frequent use.
Skill level: Advanced beginner.
Runs on: Current Web browsers
Learn more: The freeDive site includes several video tutorials at the bottom of the home page as well as test data to try out the wizard.
Related tools: Caspio is a well-established commercial alternative. For a JavaScript alternative with more control over the table created from a Google Docs spreadsheet, you might want to investigate Tabletop, which makes a Google Docs spreadsheet accessible to JavaScript code.
Highcharts JS
What it does: This JavaScript library from Highsoft Solutions provides an easy way to create professional-looking interactive charts for the Web. JQuery,Mootools or Prototype required.
What’s cool: With Highcharts, users can mouse over items for more details; they can also click on items in the chart legend to turn them on and off. There are many different chart types available, from basic line, bar, column and area charts to zoomable time series; each comes with six stylesheet options. Little customization is needed to get a sleek-looking chart — and charts will display on iOS and Android devices as well as on desktop browsers.
Apple device sales(millions of units)iPad salesiPhone salesMac salesDec-10Mar-11Jun-11Sep-11Dec-11010203040
Highcharts example with data about Apple device sales. Mouse over the graph to see details; click items in the legend to turn them on or off.
Drawbacks:Highcharts, like GoogleMaps, does have a distinctive look, so you may want to customize the Highcharts stylesheets so your visualizations don’t look like numerous other Highcharts on the Web. While charts displayed fine for me on an Android phone, they weren’t interactive (they were on an iPad).
And unlike most JavaScript/jQuery libraries, Highcharts is free only for non-commercial use, although a site-wide license for many companies costs only $80. (The cost jumps to $300 per developer seat in some cases — for example, if charts are customized for individual users.) Rendering can be slow in some older browsers (notably Internet Explorer 6 and 7).
Skill level: Intermediate to Expert.
Runs on: Web browsers
Learn more: The Highcharts demo gallery includes easy-to-view source code; the documentation explains other options.
Related tools: Google Chart Tools create static image charts and graphs or more interactive JavaScript-based visualizations; there are also JavaScript libraries such as Protovis and the JavaScript InfoVis Toolkit. Exhibit is an MIT Simile Project spinoff designed for presenting data on the Web with filtering, sorting and interactive capabilities.
Mr. Data Converter
What it does: How often do you have data in one format — while your application needs it in another? New York Times interactive graphics editor Shan Carter ran into this situation often enough that he coded a tool that converts comma- or tab-delimited data into nine different formats. It’s available as either a service on the Web or an open source tool.
![]()
Mr. Data Converter can generate XML, JSON, ASP/VBScript or basic HTML table formatting.
What’s cool: Mr. Data Converter can generate XML, JSON, ASP/VBScript or basic HTML table formatting as well as arrays in PHP, Python (as a dictionary) and Ruby. It will even generate MySQL code to create a table (guessing at field formats based on the data) and insert your data. If your data is in an Excel spreadsheet, you don’t need to save it as a CSV or TSV; you can just copy and paste it into the tool.
Drawbacks: Only CSV or TSV formats can be input, as well as copying and pasting in data from Excel.
Skill level: Beginner
8 tools for data analysis
Runs on: JavaScript-enabled Web browsers
Learn more: You can follow Mr. Data Converter onTwitter at @mrdataconverter.
Related tools: Data Wrangler is a Web-based tool that reformats data to your specifications.
Panda Project
What it does: Panda is less about analyzing or presenting data than finding it amidst the pile of standalone spreadsheets scattered around an organization. It was specifically designed for newsrooms, but could be used by any organization where individuals collect information on their desktops that would be worth sharing. Billed as a “newsroom appliance,” users can upload CSV or Excel files to Panda and then search across all available data sets or a within a single file.
![]()
Panda makes it simple to give others access to information that’s been sitting in different stand-alone spreadsheets.
What’s cool: Panda makes it simple to give others access to information that’s been sitting on individuals’ hard drives in different stand-alone spreadsheets. Even non-technical users can easily upload and search data. Search is extremely fast, usingApacheSolr.
Drawbacks: Queries are basic — you can’t specify a particular column/field to search, so a search for “Washington” would bring back items containing both the place and a person’s name. The required hosting platform is quite specific, requiring Ubuntu 11.1. (Panda’s developers have created an Amazon Community Image with the required server setup for hosting on Amazon Web Services EC2.)
Skill level: Beginner (Advanced Beginner for administration)
Runs on: Must be hosted on Amazon EC2 or a server running Ubuntu 11.10. Clients can use any Web browser.
Learn more: Panda documentation, still in the works, gives basics on setup, configuration and use. Nieman Journalism Lab has some background on the project, which was funded by a $150,000 Knight News Challenge grant.
PowerPivot
What it does: This free plugin from Microsoft allows Excel 2010 to handle massively large data sets much more efficiently than the basic version of Excel does. It also lets Excel act like a relational database by adding the capacity to truly join columns in different tables instead of relying on Excel’s somewhat cumbersome VLOOKUP command. PowerPivot includes its own formula language, Data Analysis Expressions (DAX), which has a similar syntax to Excel’s conventional formulas.
![]()
PowerPivot allows Excel 2010 to handle massively large data sets more efficiently.
What’s cool: PowerPivot can handle millions of records — data sets that would usually grind PowerPivot-less Excel to a halt. And by joining tables, you can make more “intelligent” pivot tables and charts to explore and visualize large data sets with Excel’s point-and-click interface.
Drawbacks: This is limited to Excel 2010 on Windows systems. Also, SQL jocks might prefer using a true relational database for multi-table data in order to build complex data queries.
Skill level: Intermediate
8 tools for data analysis
Runs on: Excel 2010 on Windows only.
Learn more: There are links to demos and videos on the PowerPivot main page, as well as anintroductory tutorial on Microsoft’s TechNet.
Related tools: Zoho Reports can take data from various file formats and turn it into charts, tables and pivot tables.
Weave
What it does: This general-purpose visualization platform allows creation of interactive dashboards with multiple, related visualizations — for example, a bar chart, scatter plot and map. The open-source project was created by the University of Massachusetts at Lowell in partnership with a consortium of government agencies and is still in beta.

Weave demo visualization of foreclosures in Lowell, Mass. See the interactive version.
What’s cool: The visualizations are slick and highly interactive; clicking an area in one visualization also affects others in the dashboard. The platform includes powerful statistical analysis capabilities. Users can create their own visualizations on a Weave-based Web system, or save and alter the tools and appearances of visualizations that have been publicly shared by others.
Drawbacks: Requires Flash for end-user viewing. It’s currently somewhat difficult to install, although a one-click install is scheduled for this summer. And because it’s so powerful, some users say that implementations must consider how to winnow down functionality so as not to overwhelm end users.
Skill level: Intermediate for those just creating visualizations; Expert for those implementing a Weave system.
Runs on: Flash-enabled browsers. Server requires a Java servlet container (Tomcat or Glassfish, MySQL or PostgreSQL, Linux and Adobe Flex 3.6 SDK).
Learn more: The Weave site includes demos, videos and a user guide. For more examples of visualizations that can be built using a Weave platform, seeone planner’s MetroBoston DataCommon gallery. In addition, I wrote more detailed Computerworld coverage of Weave following a presentation at Northeastern University.
Related tools: Tableau Public is a robust general-purpose visualization platform.
|
Visualize This: The FlowingData Guide to De… |
Visual Thinking: for Design (Morgan Kaufman… |
|
|
The Wall Street Journal Guide to Informatio… |
The Visual Display of Quantitative Informat… |
|
|
The Visual Miscellaneum: A Colorful Guide t… |
The Functional Art: An introduction to info… |
slide:ology: The Art and Science of Creatin… |
|
Now You See It: Simple Visualization Techni… |
Information Dashboard Design: The Effective… |
Designing Data Visualizations |
|
Beautiful Evidence |
Envisioning Information |
Visual Explanations: Images and Quantities,… |
|
Information Visualization, Third Edition: P… |
Resonate: Present Visual Stories that Trans… |
Presentation Zen: Simple Ideas on Presentat… |
Consider Your Message When Choosing What Chart to Use
Forbes contributor Naomi Robbins on the different types of charts and their uses for emphasising a particular message.
Datawrapper 1.0 Is Released
Following a successful beta version, Datawrapper version 1.0 is released.
Verification Tools for Journalists
A list tools for verifying people, places and images, from EmergencyJournalism.net.
LearnStreet: Coding Starts Here
LearnStreet, a new California-based start-up, aims to change the code learning process.
Must Zero Be Included on Scales of Graphs? Another Look at Fox News’ Graph and Huff’s Gee-Whiz Graph
A post by Forbes contributor Naomi Robbins on the use of zero baselines in graphs by the media.
Comparing Graphics from The Guardian and The New York Times: A Project by Marije Rooze
A recent project by Dutch MA student Marije Rooze compares interactive graphics from the Guardian and the New York Times, with unexpected results.
Some Useful Statistical Blogs
Forbes.com contributor, Naomi Robbins, shares some respected stats blogs and recent discoveries.
The Journalist’s ‘Learn to Code’ Resource Guide
This is a list of resources you can use to begin to write your own programs, written with journalists in mind.
Torque: An Open Source Mapping Tool for Big Data by CartoDB
A new visualisation tool for Big Data brought to you by the people at CartoDB.
Statwing: Powerful Data Analysis, Simple to Use
Given the rising interest in data analytics, there will be new tools. Here is one fresh and promising approach.
The KoBo Platform: Handheld Data Collection for Real Practitioners
Introducing KoBo, an integrated suite of applications for handheld data collection that are specifically designed for a non-technical audience.
Tips for Working with Numbers in the News
The best tip for handling data is to enjoy yourself. Data can appear forbidding. But allow it to intimidate you and you’ll get nowhere. Treat it as something to play with and explore and it will often yield secrets and stories with surprising ease.
Geofeedia: Next Generation Crisis Mapping Technology?
Situational awareness is absolutely key to emergency response, hence the rise of crisis mapping. In this post we introduce Geofeedia and discuss its potential applications for humanitarian response.
Using Data Visualization to Find Insights in Data
In order to be able to see and make any sense of data, we need to visualize it. Data visualisation expert Gregor Aisch explains the steps you need to take in order to make finding insights in data more effective.
How to Create the Perfect Line Chart
Award-winning data visualisation expert Gregor Aisch outlines the best practices for creating line charts.
Getting Data: A Five Minute Field Guide
Looking for data on a particular topic or issue? Not sure what exists or where to find it? Don’t know where to start? This post shows how to get started with finding public data sources on the web.
Video: School of Data Journalism – Precision Journalism Workshop
This video will guide you through the very basics of how to use Excel for data journalism projects.
Video: School of Data Journalism – Spending Stories workshop
An Open Knowledge Foundation project helps journalists to find stories in spending data.
Meet data mapping platform CartoDB
Introducing CartoDB – an open source, cloud-based data mapping platform which makes mapping accessible, even for complete beginners.
Delivering data: How to build a news app
A look at what news apps are and what they do, when it is useful to build them and how to build them.
Video: The joy of stats with Hans Rosling
Join Prof. Rosling on an exciting trip through the history and development of one of the pillars of data journalism: good old statistics.
Video: School of Data Journalism – Making Data Pretty workshop
An effective visualisation is the key element to engage your audience around data projects. This video from the School of Data Journalism explains the secrets of the trade.
Video: School of Data Journalism – Getting Stories from Data workshop
Caelainn Barr and Steve Doig explain how to turn public data sets into a goldmine of information.
Video: School of Data Journalism – Information Wants to Be Free workshop
The second workshop of the Data Journalism School in Perugia looked at how journalists can use open data and Freedom of Information legislation to get access to the information they need.
Introduction to open-source GIS tools for journalists
Location is quickly becoming a core value of journalism and geographic literacy is on the rise. A look at geocoding tools.
Creating dot density maps with Chicago Tribune’s new open source toolkit
Chicago Tribune hacker Christopher Groskopf explains the tools and techniques behind the creation of dot density maps with U.S. census data.
The limitations of red-green colour scales in infographics
Information visualization expert Gregor Aisch explains the end of his love story with diverging red-green colour scales.
Beginner’s guide for journalists who want to understand API documentation
There are three letters that have been floating around the media world for several years now: API. There aren’t many resources that explain API documentation to non-coders. Here’s an overview of how to figure it out.
Reading data from Flash sites
Adobe Flash can make data difficult to extract. This tutorial will teach you how to find and examine raw data files that are sent to your web browser, without worrying how the data is visually displayed.
Essential visualisation resources: Tools for analysis, collection and enterprise
This is the first part of a multi-part series designed to share with readers an inspiring collection of the most important, effective, useful and practical data visualisation resources. The series will cover visualisation tools, resources for sourcing…
Essential visualisation resources: Tools for mapping
This is the fourth part of a multi-part series designed to share with readers an inspiring collection of the most important, effective, useful and practical data visualisation resources. The series will cover visualisation tools, resources for sourcing…
Power tools for aspiring data journalists: Funnel Plots in R
In the following post Tony Hirst describes a quick way of analysing a mortality dataset using R, a very powerful statistical programming environment that should probably be part of your toolkit if you ever want to get round to doing some serious stats…
The top 10 data-mining links of 2011
Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We’ve written before about the goals of the project, and we’re developing some new technology, but mostly we’re…
A computational journalism reading list
There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of “programmer…
The Bastards Book of Ruby
The Bastards Book of Ruby is an introduction to programming for non-programmers. The online book focuses on the use of programming for the gathering, organizing, and analyzing of data in all its forms.
Programmer-journalist job openings
A spreadsheet listing over 50 programmer-journalist jobs has been circulating online for some time now. All the jobs require technical skills and range from newsroom developer to interactive designer, multimedia producer and social media editor.
Getting text out of an image-only PDF
In the previous guide, we describe several methods for turning PDFs into data usable for spreadsheets. However, those only handle PDFs that have actual text embedded within them. When a PDF contains just images of text, as they do in scanned documents,…
Turning PDFs to text
Adobe’s Portable Document Format is a great format for digital documents when it’s important to maintain the layout of the original format. However, it’s a document format and not a data format.
Using Google Refine to clean messy data
Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a “power tool for working with messy data” but could very well be advertised as “remedy for eye fatigue, migraines, depression, and other symptoms of…
Manual on Excel for data journalists
The Centre for Investigative Journalism came out with a handbook this year for journalists who want to master the art of interrogating and questioning numbers competently.
Tableau Public
Tableau Public is a data visualisation tool that enables users to condense complex datasets into simple and easy to read graphs, which allow for better understanding of the datasets.
Where are the bodies buried on the web? Big data for journalists
The following post is the introduction to the free online ebook ‘Where are the bodies buried on the web? Big data for journalists’ published by former Apple engineer Pete Warden in January this year.
10 tools that can help data journalists do better work, be more efficient
It’s hard to be equally good at all of the tasks that fall under data journalism. To make matters worse (or better, really), data journalists are discovering and applying new methods and tools all the time. As a beginning data journalist, you’ll want…
How to scrape Toronto data: a basic tutorial
This post is a step-by-step tutorial on scraping for beginners with video clips.
Visualizing Toronto’s water usage: a tutorial
This post is a tutorial on data visualisation for those who are just starting out. You will learn how to take a big data file, clean it, filter it and turn it into a visualisation.
List of tutorials for journalists on how to use spreadsheets
This post is a list of the best free tutorials on the web for journalists who want to learn spreadsheet skills.
Video archive EJC @PICNIC11: From database cities to urban stories (II)
In this post you can find the videos of the talks from the second European Journalism Centre session: ‘From database cities to urban stories: What are the success stories?’, at the 2011 edition of the leading media festival PICNIC in Amsterdam.
How to Find Stories in EU Spending Data
Caelainn Barr, EU data journalist, talks about how to find stories in EU spending data at the EJC/OKF data driven journalism workshop in Utrecht in September.
Video archive EJC @PICNIC11: From database cities to urban stories (I)
In this post you can find the videos of the talks from the first European Journalism Centre session: ‘Using technology to run our cities: promises and perils’, at the 2011 edition of the leading media festival PICNIC.
BuzzData
There’s a buzz going around about the new data-sharing hub. BuzzData, the new social network for open source data.
Google Public Data Explorer
Released in August 2010, Google’s Public Data Explorer makes public data and statistics easier to understand and share.
The Guardian Data Store
The Guardian Data Store is an online directory providing a selection of datasets on topics of public interest and tools to explore them, along with demonstrations of original or guest visualisations of the datasets.
source: http://datadrivenjournalism.net/resources
spreadsheets:
Knight Digital Media Center tutorial on spreadsheets. This is a useful and detailed tutorial for absolute beginners. It has 25 sections explaining among others how to import data in a spreadsheet, how to use formulas and how to format cells.
-
Forjournalists.com Excel basics and advanced features, tips and tricks and a four-part Excel training course (towards intermediate/advanced level).
-
McGill University guide to exporting a table from a PDF file into an excel spreadsheet.
Nokogiri, an XML parsing library for Ruby
Instead of using Firebug, you can also use Safari’s built-in Activity window, or Chrome’s Developer Tools, for the inspection part. To parse the result, we use Ruby and Nokogiri, which is an essential library for any kind of web scraping with Ruby.
A Series of Tubes…and Files
While the site makes the data difficult to download, it’s not impossible. In fact, it’s fairly easy with some understanding of web browser interaction. The content of a web page doesn’t consist of a single file. For instance, images are downloaded separately from the webpage’s HTML.
Flash applications are also discrete files, and sometimes they act as shells for data that come in separate text files, all of which is downloaded by the browser when visiting Cephalon’s page. So, while Cephalon designed a Flash application to format and display its payments list, we can just view the list as raw text.
Viewing Cephalon’s page. The Firebug panel is circled
Firebug can tell you what files your browser is receiving. In Firefox, open up Firebug by clicking on the bug icon on the status bar, then click on the Net panel. This panel shows every file that was received by your web browser when it accessed Cephalon’s page.
Close-up of the Firebug panel. The Net tab is circled in yellow, the relevant .swf file is circled in green.
We know we’re looking for the Flash file, so let’s look for that first. Flash applets use the suffix swf. The only one listed is spend_data.swf. In Firebug, right-click on the listing, copy the url, and paste it into a new browser window:
http://www.cephalon.com/Media/flash/spend_data-2009.swf
You can see the Flash file in its context here: http://www.cephalon.com/our-responsibility/relationships-with-healthcare-professionals/archive/2009-fees-for-services.html.
You’ll get a larger-screen view of the list, though that doesn’t really help our data analysis. As you may have noticed in the Firebug Net panel, spend_data.swf is less than 45 kilobytes, which doesn’t seem large enough to contain the entire list of doctors and payments. So where is the actual data stored?
Sniffing Out the Data
Here’s how find it: First, clear your cache in Firefox by going to Tools->Clear Recent History and selecting Cache. With Firebug still open, refresh the browser window that has spend_data.swfopen.
![]()
Relevant XML file is circled here.
Firebug’s window tells us that besides receiving spend_data.swf, our browser downloaded two xml files. One of these is more than 100 kilobytes, which is about what we would expect for an XML-formatted list of a few hundred doctors.
Now right-click on the file in Firebug and select Open in New Tab, and then View Page Sourceby right-clicking in the new tab. You should see a text file full of entries like the following:
![]()
That’s what we were looking for: a well-structured list of the doctors and what they got paid. Now it’s a simple matter of using an xml parser, like Ruby’s Nokogiri, to iterate through each “row” node and pick up the essential values.
Parsing with Nokogiri
The following is a brief example of Nokogiri‘s most basic methods. It assumes you have Ruby andNokogiri installed, and a little familiarity of basic programming.
The two Nokogiri methods we’re most interested in are:
-
css – this lets us select tags inside XML and HTML documents. In this example, we want thevalue and row tags.
-
text – with each element returned by css, text will give us the actual characters enclosed by the element’s tags.
Each row represents a record, and each value represents a datafield, like name and location. So, we simply want to read each row and select the values we’re interested in.

Here’s a compact variation of the above code that writes the result into a file:

So, what first appeared to be the most difficult report to parse ends up being the easiest. Whether you’re dealing with a Flash application or a HTML database-backed website, your first step should be to see what text files your browser receives when accessing the page.
The fundamental question: What can this API do for me?
Look for mentions of the word “requests.” If you don’t see that, look for the words “REST API,” or something that looks like the latter part of a URL.
Within those sections, look for the words “get” and “post.” These are called methods, the specific actions the API can do. (Some developers will quibble and call them functions. For this tutorial, we’ll stick to methods.)
If the documentation is written in plain English, it will be easy to understand what the method is doing. If not, you’ll need someone with more coding experience to help interpret what’s going on. But know this:
“Get” asks for something from the API server — as in, GET me the number of times an address shows up in the database.
“Post” changes the database by creating, adding or removing something from it — as in POST a new address to the database.
In what format can I get the data?
An API usually lets you choose how the data will come back to you, also known as the response format. You’ll usually see “json” or “XML.” Sometimes, you’ll see “txt” or other formats. The format is best decided by your developer, but at least you’ll know what’s available.
To find format options, search for the word “format” or “response.” Sometimes the format is mentioned at the start of documentation; sometimes, you’ll find “format” in the methods.
What does the API need in exchange for what I want?
Sometimes you can make a API request or post without identifying yourself. But API creators often want to know how the API is being used and by whom. In addition, they want to prevent server overload and head off developer hijinks, so many APIs require a key — an ID unique to the person or program making a request.
Getting a key is generally straightforward. Look for the word “authentication,” “API key” or “APIkey” to get the instructions, and to see which methods (which “gets” and “posts”) require authentication.
Can I test API requests even if I’m not a developer?
Yes. You can build your own test request by copying the example response found in the method and changing the variables, usually referred to as parameters.
For example, let’s try getting New York Times reviews for the “Harry Potter” movies as an XML-formatted response. Use your favorite search engine to findThe New York Times movie reviews API. This API is not perfect (it’s in beta, after all). The steps below can be compressed with shortcuts once you become more experienced, but since we’re assuming this is your first time, we’re going to take the slow road.
Once you’re on the API page:
1) Look for something that allows you to get reviews using keywords. In this case, that’s the “Reviews by Keyword” method. Within the method description is a URI example (the text in the gray box). That’s the template for your request.
Copy it, paste it into a text editor [TextWrangler (Mac), TextMate (Mac) orTextPad (Windows)] and start replacing the parameters, the things in braces and brackets. They’re bolded below for easy reference.
In the Reviews by Keyword method, there are two required parameters:version which is the API version (use v2), and API-key, which you can getright here.
You’d go from this:
http://api.nytimes.com/svc/movies/{version}/reviews/search[.response_format]?[optional-param1=value1]&[...]&api-key={your-API-key}
To this:
http://api.nytimes.com/svc/movies/v2/reviews/search[.response_format]?[optional-param1=value1]&[...]&api-key={paste your API key and here and delete the surrounding braces}
2) Next, set up two additional parameters, which are described a little further down in the same section of the Movie Reviews API documentation:
-
The response-format, which will be .xml
-
A keyword query — we’ll use query=Potter because searching for ‘Harry+Potter’ doesn’t work. (I know because I tried. Remember, the API is in beta.)
-
An opening-date range, from the first film (which came out in November 2001) through the last film (which comes out this week). As the documentation tells you, the format for a range is
YYYY-MM-DD;YYYY-MM-DD, so we’ll useopening-date=2001-11-01;2011-07-31
Your URI example should now look like this (the new parameters are in bold):
http://api.nytimes.com/svc/movies/v2/reviews/search.xml?&query=Potter&opening-date=2001-11-01;2011-07-31&api-key={paste your API key and delete the surrounding French braces}
3) Copy and paste the URI you made into a Web browser address bar. Hit return.
If you made the changes correctly, you’ll get a response similar to what’s on this page. In fact, if you want, you can copy the URI above up to the = before the {, paste it into your browser’s address bar, and add your API key to the end and hit return to see the XML output.
Voilà. You’ve just made your first API call and pulled New York Times “Harry Potter” movie reviews. (Plus a straggler. Again, beta.)
Some API developers are nice enough to include a console, sandbox or fill-in-the-blank form so you can test your requests without hand-building them. Better yet, the tools usually generate both the properly formatted request and the result, which you and your developers can then copy and paste and use as you wish.
You will come across lots of documentation styles as you begin to explore what’s available to you. If you have questions about what you find, feel free to ask them on the Hacks/Hackers help board.
5
R Specifically the lattice & ggplot2 libraries
Learn the power of condition on a categorical variable to make multiple plots:
xyplot( numphone ~ year | country, data=wp, type="b", scales="free")
The key here is the “| country” notation which “conditions” on country to create one plot for each country in the dataset. This can be a great way to rapidly explore a dataset of counties, states, school districts, whatever. You decide what you want on your x and y axes and then condition across an appropriate conditioning variable in one line of code you can have thousands of plots output showing how your x&y vary across categories. This is the simplest case, there is much more power under the hood to unleash.
To do multiple pages and store to pdf just do:
pdf('~/path/to/mydoc.pdf', width=11, height=8.5) xyplot( numphone ~ year | country, data=wp, type="b", scales="free", layout=c(1,1) ) dev.off()
You’ll then get one page for as many countries as are in the dataset.
ggplot2 has a similar feature known as “faceting” the data.
This concept of “conditioning” and “faceting” can let you slice massive datasets into digestible morsels very rapidly.
50
Data Sources
Although we gather some information by hand, GovTrack pulls much of the information you see from a variety of other sources. We also make available the information we collect in a normalized XML format for other projects to reuse. For more information on that, see theDeveloper Documentation.
Some information comes from these official government sources:
-
The Library of Congress and the Congressional Research Service via THOMAS.gov for the status legislation, subject terms of bills, bill summaries, and upcoming House committee meetings (from the Daily Digest). We have been actively campaigning Congress and the Library of Congress to publish this information in a structured data format. Until then, we “screen-scrape” their web pages, extracting the information in a semi-reliable automated way.
-
The House of Representatives and the Senate for information on Members of Congress, committee membership, voting records, upcoming bills, and upcoming committee meetings.
-
The House Majority Leader’s docs.house.gov website for legislation scheduled for the week ahead.
-
The Government Printing Office for the text of legislation and photos of members of congress from the Congressional Pictorial Directory.
-
The Congressional Biographical Directory for biographical and historical information on members of Congress.
-
The Census Bureau for geographic data on congressional districts.
Other federal information comes from:
-
Congressional Committees, Historical Standing Committees data set by Garrison Nelson and Charles Stewart for biographical and historical information on Members of Congress, including congressional district assignments.
-
Rosenthal, Howard L., and Keith T. Poole. United States Congressional Roll Call Voting Records, 1789-1990. Pittsburgh, PA: Keith T. Poole, Carnegie Mellon University, Graduate School of Industrial Administration, 1991. (1st-100th Congresses)
-
Martis’s “The Historical Atlas of Political Parties in the United States Congress”, via Keith Poole’s roll call votes data set, for political party affiliation for Members of Congress from 1789 through about year 2000.
-
The Sunlight Labs Real Time Congress API for live House activity on our Live Video page.
-
The House Republican Conference for some photos of Members of Congress and their legislative summaries.
State legislative information comes from:
-
Some data is additionally from Open States.
Source: http://www.govtrack.us/sources
Here’s a list of websites using data assembled by GovTrack
-
OpenCongress: Quite like GovTrack with a focus on social networking-type tools.
-
MAPLight.org: Computes correlations between votes and campaign contributions.
-
Follow the Oil Money: A money-and-votes site directed at energy and oil.
-
WhereABill.org: Follow the progress of a bill through the capitol complex, visually, on a map.
-
TheMiddleClass.org: Tracking legislation of significance to the middle class.
-
National Journal: Committee Chairmen: An interactive game where you can play musical chairs with the committee leadership of the incoming 111th Congress.
-
Filibusted.us: Who is filibustering the most bills? Sunlight Apps for America winner (2009).
-
Govit: Citizens vote on bills.
-
Laws I Like: Vote on bills and post your votes to your profile.
-
RepresentedBy: Post your representatives to your profile and keep up with what they’re doing.
-
NewBallot: You vote on the same laws Congress is voting on. Every week your representative gets a report on how their constituency wants them to vote.
-
VoteReports: Use VoteReports to see if your representatives truly represent you
-
Congress111, by Mike Bluestein, has a variety of information on Members of Congress, based in part on GovTrack’s database.
-
WatchDog.net: Another GovTrack clone with some additional information.
-
CongressDB: Analysis of voting records.
-
Almost half of the entries to Sunlight Foundation’s Apps for America contest used data from GovTrack
-
Congress app for Android phones by the Sunlight Foundation.
-
Agreement Groups in the United States Senate, by Adrien Friggeri
-
Scout by the Sunlight Foundation
-
Call on Congress voice service app by the Sunlight Foundation
http://palewi.re/posts/2008/04/20/python-recipe-grab-a-page-scrape-a-table-download-a-file/
http://www.texastribune.org/library/data/
document cloud
http://www.publiclaboratory.org/home
InfoChimps offers a big data stack managed as a service within private data centers. For those content to run in the public cloud, Qubole takes the concept one level further, with a turnkey Hadoop and Hive analysis platform that runs on Amazon EC2.
One to watch: new entries into enterprise Hadoop infrastructure will include WANdisco,
Berkeley Data Analytics Stack offers an alternative platform that performs much faster than Hadoop MapReduce for some applications focused on data mining and machine learning.
At the same time, Hadoop is reinventing itself. Hadoop distributions this year will embrace Hadoop 2.0, and in particular YARN,
‘
If you’d like to learn more about the possible uses of this tool, I’d just recommend the materials linked in the blog post above (which I’ve also listed below): An introduction to Paper Machines, by Chris Johnson Roberson”; Understanding Paper Machines,” by Jo Guldi; “Supercharge Your Zotero Library Using Paper Machines, Part I,” by Sarita Alami; and “Supercharge Your Zotero Library Using Paper Machines, Part II,” by Sarita Alami
http://launch.co/#/rooms/Ticker
https://hackpad.com/0PLD#QuantifiedSelf-Berlin
http://www.barebones.com/products/yojimbo/
https://workflowy.com/
In the previous blog post we explained why we think Open311 is a good idea. In this post we’ll explain what it actually does.
Open311 is very simple, but because it’s fundamentally a technical thing it’s usually explained from a technical point of view. So this post describes what Open311 does without the nerdy language (but with some nerdy references for good measure). At the end there’s a round-up of the terms so you can see how it fits in with the actual specification.
We’re using an unusual example here — a blue cat stuck up a tree — to show how applicable Open311 is to a wide range of problems. Or, to put it another way, this is not just about potholes.
So… someone has a problem they want to report (for this discussion, she’s using a service like FixMyStreet).
There’s one place where that report needs to be sent (in the UK, that’s your council). That administrative body (the council) almost certainly has a database full of problems which only their staff can access.
![]()
I have a problem :–(
the “client”
![]()
I fix problems!
the “server”
In this example, FixMyStreet is an Open311 client and the council is an Open311 server. The server is available over HTTP(S), so the client can access it, and the server itself connects to the council’s database. In reality it’s a little bit more complicated than that (for now we’ll ignore clients that implement only part of Open311, multiple servers, and decent security around these connections), but that is the gist of it.
Although it’s not technically correct to confuse the client with the user, or the server with the council, it makes things a lot easier to see it this way, so we’ll use those terms throughout.
Service discovery
To start things off, the client can ask the server: what services do you provide?
Until the client has asked the server what problems it can fix, it can’t sensibly request any of them.
![]()
What services do you offer?
![]()
I can:
POT: fix potholes
TELE: clean public teleports
PET: get pets down from trees
JET: renew jetpack licenses …
FixMyStreet can use the response it gets from such a service discovery to offer different categories to people reporting problems. We actually put them into the drop-down menu that appears on the report-a-problem page.
In the Open311 API, this is handled by
GET Service List. Each service has its ownservice_codewhich the client must use when requesting it. Note that these services and their codes are decided by the server; they are not defined by the Open311 specification. This means that service discovery can easily fit around whatever services the council already offers. The list of services can (and does) vary widely from one council to the next.
Service definitions
Some services require specific information when they are requested. For example, it might be important to know how deep a pothole is, but it’s not relevant for a streetlight repair.
![]()
Tell me more about the PET service!
![]()
I can get pets down from trees, but when you request the service, you *must* tell me what kind of animal the pet is, OK?
In the Open311 API, this is handled by the
GET Service Definitionmethod. It’s not necessary for a simple Open311 implementation. In fact, it only makes sense if the service discovery explicitly told the client to ask about the extra details, which the server does by addingmetadata="true"to its response for a given service.
Requesting a service
This is where it gets useful. The client can request a service: this really means they can report a problem to the server for the body to deal with. Some submissions can be automatically rejected:
![]()
My hoverboots are broken :–( I need BOOT service!
![]()
404: Bzzzt error! I don’t fix hoverboots (use service discovery to see what I *do* fix)
![]()
Hey! Blueblue is up a tree! I need PET service (for cats)!
![]()
400: error! You forgot to tell me where it is.
If the report is in good order, it will be accepted into the system. Open311 insists that every problem has a location. In practice this is usually the exact position, coordinates on planet Earth, of the pin that the reporter placed on the map in the client application (in this case FixMyStreet.com).
![]()
I need PET service (for cats)! Blueblue is stuck up the biggest tree in the park :–(
![]()
200: OK, got it… the unique ID for your request is now 981276
In the Open311 API, this is handled by
POST Service Request. You need an API key to do this, which simply means the server needs to know which client this is. Sometimes it makes sense for the server to have additional security such as IP address restriction, and login criteria that’s handled by the machines (not the user).
Listing known requests
The server doesn’t keep its reports secret: if asked, it will list them out. The client can ask for a specific report (using the ID that the server gave when the report was submitted, for example) or for a range of dates.
![]()
Did anyone ask you for help yesterday?
![]()
Yes, I got two requests:
request 981299: TELE dirty teleport at the cantina (I’m waiting for a new brush)
request 971723: POT pothole at the junction of Kirk and Solo (I filled it in)
In the Open311 API this is handled by
GET Service Request(s). The client can indicate which requests should be listed by specifying the required service request id, service code, start date, end date or status.
Does Open311 work?

Oh yes. On the Open311 website, you can see the growing list of places, organisations, and suppliers who are using it.
The technical bit
In a nutshell: Open311 responds to HTTP requests with XML data (and JSON, if it’s wanted). There’s no messing around with SOAP and failures are reported as the HTTP status code with details provided in the content body.
You can see the specification for Open311 (GeoReport v2). It doesn’t feature blue cats, but if you look at the XML examples you’ll be able to recognise the same interaction described here. And remember the specification is an open standard, which means anyone can (and, we think, should) implement it when connecting a client and server in order to request civic services.
Coming next…
In the next blog post we’ll look at how FixMyStreet uses Open311 to integrate with local council systems, and explain why we’re proposing, and utilising, some additions to the Open311 specification.
1) Building credibility (and engagement) in digital news
Most news publications are not as transparent about their own reporters and their sources as they could be, and many don’t report retroactively on whether pundits/sources got things right or wrong. Notable exceptions, like Wikipedia which footnotes all entries, have become very trusted (and popular) sources of information. How can news orgs move towards embedding more credibility into news? Also, can news animations be credible?
2) Improving content recommendations
Currently content is largely recommended based on relevance (you are reading about bridges, here’s another article about bridges), social context (your friends are reading about bridges, read this too) or editorial selection (Our editor believes you need to know about bridges, read this). What other ways could be available to enable discovery of content? How can we broaden the perspectives being shared?
3) Freeing news content from news sites (in a big big way)
Only 1.1% of total page views online take place on news sites (even less on mobile). If content is restricted to news sites, digital ad revenue will remain small. Can news makers develop a standard to allow news to be distributed as easily across web and mobile as ads and preserve some benefit for content creators?
4) Scaling media-focused social enterprises (while safeguarding mission)
Media companies have traditionally had a source of revenue (advertising) that is not directly tied to content or customers. New, socially-focused media companies (PolicyMic, Upworthy, Zeega) have ads as well as other revenue sources (lead generation, content creation). How best to balance these and other potential revenues against the needs of the people whose lives they are trying to change?
5) Making data visualization work on mobile
Data visualizations are 30x as likely to be shared as traditional text articles, and have become an important part of the news landscape. But as more news gets read on phones, the impact of these visualizations is mitigated. Can we develop standard practices to maximize the impact of data visualizations on mobile phones?
Because of the nature of the conference, it is entirely possible that another news-fooer identified a completely different series of issues that folks are focused on. I look forward to hearing these as Knight gathers more perspectives on the conference.
We want anyone who comes to Syria Deeply to walk away smarter and better informed about what’s happening in our world. We’re fielding your feedback and story ideas through info@syriadeeply.org.
OpenBlock Rural, which builds on the OpenPlans project OpenBlock to aggregate news and data using the system that powers EveryBlock (got that?);
-
SwiftRiver, a tool to manage data (like tweets) in real-time;
-
FrontlineSMS, a large-scale text messaging platform for non-governmental organizations.
all social changes and projects
Links to Content Below: General | Innovating Media | Engaging Communities | Fostering Arts
General Resources
Foundation Center – Tools and Resources for Assessing Social Impact (TRASI)
This database contains approaches to impact assessment, guidelines for creating and conducting an assessment, and actionable tools for measuring social change.
Urban Institute – Outcomes Indicators Project
This resource supports nonprofit performance tracking by suggesting outcomes and indicators to assist nonprofit develop new measurement approaches and enhance existing systems.
Grantcraft
This website provides materials that offer insights and approaches to improve the effectiveness of social sector organizations, including several guides on evaluation and assessment.
Innovation Network – Point K Learning Center
Point K’s resources offers a set of tools, including an Organizational Assessment Tool, Logic Model Builder and Evaluation Plan Builder, to support non-profits in designing and implementing assessments for their own programs.
Root Cause – Building a Performance Measurement System
This guide provides a practical, five-step process for developing a performance measurement approach to support nonprofits as they select measures, design reports, and communicate impact.
W.K. Kellogg Foundation – Evaluation Handbook & Logic Model Development Guide
This workbook provides a framework for approaching nonprofit program evaluations that support program performance. The guide introduces the logic model tool to nonprofits seeking to strengthen program design and delivery, and disseminate results.
Innovating Media & Information Resources
IMPACT: A Practical Guide to Evaluating Community Information Projects
This guide, produced by Knight Foundation and FSG Social Impacts Advisors, supports organizations to collect information about the effectiveness and impact of their community news, information and media projects.
Measuring the Online Impact of Your Information Project
This report, produced by Knight Foundation, FSG Social Impact Advisors and journalism professor Dana Chinn, outlines how funders and their grant partners reach and engage online audiences. It identifies metrics for measuring impact for projects creating informed and engaged communities and includes a set of useful examples.
Center for Social Impact (American University) & The Media Consortium – Investing in Impact
This paper outlines reasons to assess public interest media, synthesizes primary evaluation needs, and proposes news tools to assist those involved in public interest media track their work.
Center for International Media Activists – Planning and Evaluation for Media Activists
This resource outlines recommendations, case studies and tools for performing strategic planning and evaluation for media justice projects.
Evidence of Change: Exploring Civic Engagement Evaluation – Building Movement Project
The report presents a brief summary of key findings from the 2010 Civic Engagement Evaluation Summit.
Community Tool Box – University of Kansas
This online toolkit offers extensive information about approaches to building healthy communities, including guides for evaluating community programs and initiatives.
Center for Information and Research on Civic Learning and Engagement – Tufts University
The center conducts research on the civic and political engagement of young Americans and offers several research and evaluation tools for gauging civic engagement.
Harvard Family Research Project – The Evaluation Exchange
Thisperiodical regularly provides lessons and emerging best practices for evaluating programs and policies, specifically those focused on children, families, and communities.
IMPACT Arts
This repository provides resources for those working in the arts who want to understand the social impact of their projects.
Grantmakers in the Arts – Digest: Studies, Books, Web Sites
This resource showcases publications and tools for informing strategy and assessment of arts funders and their nonprofit partners in the field.
Boston Youth Arts Evaluation Project
This project, a collaboration between nonprofits working in youth arts and national leaders in research, promotes innovative evaluation methods and tools for measuring youth arts.
Yahoo! Style Guide
“Learn how to write and edit for a global audience through best practices from Yahoo!”
Beginning Reporting
A website for beginning reporters, those studying the craft and their teachers.
Digital Journalist Survival Guide: A Glossary of Tech Terms You Should Know
A comprehensive glossary of terms associated with Internet journalism. Terms every digitally enthused journalist should know.
My High School Journalism
My High School Journalism describes itself as the world’s largest host of teen-generated news.
Citizen Media Law Project
The Citizen Media Law Project is a new pro bono initiative hosted by the Berkman Center for Internet & Society at Harvard University.
Knight Digital Media Center
The Knight Digital Media Center is a partnership between the Annenberg School for Communication at the University of Southern California in Los Angeles and the University of California at Berkeley Graduate School of Journalism that provides fellowships and multimedia training resources for aspiring New Media journalists.
Rich Gordon’s Online Community Cookbook
In the past year or so, the newspaper industry has devoted considerable attention to online communities. Newspapers have launched blogs, opened up discussion via article comments, built new online communities themselves (for instance, dozens of “moms” sites) and begun to experiment with the new world of social network sites such as MySpace and Facebook. Medill’s Rich Gordon ties all of these developments together into a structured format in order to understand, build, and sustain online communities.
Center for Social Media’s Guide to Fair Use in Online Video
This guide by the Center for Social Media at American University’s School of Communication is a code of best practices that helps creators, online providers, copyright holders, and others interested in the making of online video interpret the copyright doctrine of fair use. Fair use is the right to use copyrighted material without permission or payment under some circumstances.
Journalism 2.0 PDF Downloads
Download PDF versions of Journalism 2.0: How to Survive and Thrive in various languages here.
IJNet’s 10 Steps to Citizen Journalism Online
The International Center for Journalists and IJNet.org created this interactive training module as a basic introduction to hyperlocal news sites and blogs. You will need the Adobe Flash player to view the module.
The New West FAQ for Online Community Journalism Entrepreneurs
Jonathan Weber, editor and founder of NewWest.net, created this FAQ for those interested in creating local online news sites. Weber covers why he started New West, its revenue models and expected profits, how to get content, what technology is available, who the competitors are and more.
Journalism 2.0: How to Survive and Thrive
A guide to help professional and amateur news producers understand and implement digital tools to enhance their reporting. Written by Mark Briggs, assistant managing editor for interactive news at The News Tribune in Tacoma, Washington.
Community News Sites
Our list of community news sites.
Things We Like
KCNN is constantly exploring citizen media sites for good ideas to share with you. Check them out. Suggest things we should look at.
Jump Start Your Reporting
Do you need to find an expert fast, research your U.S. Senator’s voting record, investigate a local nonprofit? Here are some databases that can provide some shortcuts.
Journalism Training Sites
Here are some web sites that offer even more journalism training. Check them out.
Launching a Nonprofit News Site
The number of nonprofit news ventures is increasing rapidly and you may be thinking about becoming a part of it. This guide will walk you through the process – including the hurdles and the requirements – whether you are seeking to establish a federally recognized 501(c)3 organization or a project within a university or college.
Outside-the-Box Community Engagement
Engaging readers is why your online news community exists. You can’t use the wisdom of the crowds if the crowd isn’t talking. Without fast and substantive engagement, you might as well publish a newspaper. So when you build it and they don’t come, what do you do, short of waiting?
Pulitzer Center’s Media on the Move
This learning module is filled with text and videos that will guide journalists from story idea, through the reporting and distribution process. This approach treats the issues covered as campaigns, not just stand-alone stories. That means wide collaborations, embracing new technologies and taking the journalism out to classrooms and universities to engage the next generation.
Making the Most of Metrics
Whether you’re running a small hyperlocal community Web site or a large regional citizen media site, you can use free or inexpensive tools to measure how many people are visiting your site and where they like to go most. With the right analytics tools, you can also get very specific details in addition to total traffic numbers. This knowledge will then empower you to improve your site, increase traffic and give accurate information to potential advertisers and sponsors.
Networked Journalism: What Works
Engaging Audiences: Measuring Interactions, Engagement and Conversions
The rise of social media tools in recent years has empowered online news startups to increase content distribution, market their sites and track users. But most say they cannot lasso data to track whether they are turning users into supporters who will help their sites survive.
Philadelphia Enterprise Reporting Awards
Learn how collaborative journalism projects that received grants from J-Lab turned out over 300 stories, blogs, podcasts, videos, databases and maps. View the press release and the report on the first year.
Rules of the Road: Navigating the New Ethics of Local Journalism
With journalism entrepreneurs launching local news startups at a rapid pace, the local news landscape is evolving – and so are the rules of the road guiding ethical decisions. Where a bright ethical line once separated a newsroom from its business operations, one person now often wears multiple hats, as editor, business manager and grants writer. Site publishers navigate new kinds of critical decisions daily. This guide examines a number of them. You can click to any topic in any order. Or, you can cruise through the Table of Contents. On every page you’ll find a box that says, “Share your story.” We invite you to weigh in with an ethical problem you faced – and your solution. Your participation will help inform a work in progress.
New Media Makers Toolkit
This learning module is filled with original reporting that will help you learn about the innovative community news initiatives that are cropping up around the United States – and securing grants from foundations that have not traditionally supported journalism. In the case studies and accompanying videos, you’ll meet citizen journalists and professional journalists who have launched news initiatives that either partner with or supplement their metro news outlets. A key part of this toolkit is a searchable database, where you can see the kinds of news ventures that foundations have supported since 2005.
Likes & Tweets: Leveraging Social Media for News Sites
If you’re like most journalists and media entrepreneurs, you use social media daily, but that doesn’t mean you’re doing all you could with it to engage with your community, listen and monitor the conversation, or use it to plan outreach campaigns around news events, real world meet-ups and breaking stories.
That’s where this guide comes in. It’s a roadmap for improving both your understanding of social media and your use of it. This learning module focuses on the principles of authenticity, transparency and crowd-sourced, real-time communication that make social media so strikingly different from traditional media. It will also give you hands-on tools, tips and tactics that can make your daily use of Facebook, Twitter and other resources much more effective.
Interviewing: A practical guide for citizen journalists
Interviews are integral to good journalism. They provide more than just additional voices; they provide facts, expertise, balance, depth and credibility. They also breathe life into information that might otherwise fall flat. Whether you already interview or are daunted by the prospect, learn what types of interviews you should go for and how they can improve your journalism. Figure out where to quote or paraphrase. Learn how to navigate the unique ethical pitfalls that confront citizen journalists. Module developed by Lynne Perri and Angie Chuang at American University’s School of Communication.
Independent Metro News Sites Database
As daily metro newspapers continue to lose ground, a new model is emerging: Independent metro news sites with paid staff members. Primarily online only ventures, these sites continue to gain traction and attract attention for coverage of their communities. This living database tracks the business side of these news operations, offering a glimpse at their funding sources, budget, staffing levels, and visitor traffic.
The Freebies List for Frugal Journalists
In the era of new media, it’s important for new skills to be learned to keep up with growing audience demand. Editing audio and video for the Web is commonplace now, as is using the Internet for research and sharing. While there are plenty of good software programs out there to buy, comparable ones can be found all over the Internet for free or next-to-free. We have compiled a growing list of our favorites for anyone to use. Comment on the ones you find useful and let us know if you find any more out there.
The Citizen Journalist’s Guide to Open Government
This extensive, multimedia e-learning module helps new media makers understand how to obtain public records and get into public meetings. The guide features a unique, interactive map that tells citizens how they can locate open-government information on each of the 50 state Web sites. Produced by Geanne Rosenberg, founding chair of Baruch College’s new undergraduate Department of Journalism and the Writing Professions.
Twelve Tips for Optimizing Your Site for Search Engines
There’s good news for even solo citizen journalists who want to improve how their sites are found through search engines like Google: Your own homegrown search engine optimization can get you many of the benefits of a professional retooling. Search engine optimization, or SEO, just means making your site as easy to find and highly ranked as possible by search engines like Google, Yahoo, MSN and Ask.com. That way, people using those engines to look for relevant content can find what you have to offer. That’s increasingly important as more and more visitors find their way to sites like yours not by typing in your Web address, but by plugging a few choice words into their favorite search engine. Learn some easy ways to boost your ranking and get more traffic.
Twitter Tips: Today’s Must-Have Tool for Citizen Journalists
Twitter has finally hit its stride as a leading tool for finding and sharing timely information from all sorts of places and sources. Its usefulness for breaking news is obvious. However, Twitter is equally useful for tracking ongoing stories and issues, getting fast answers or feedback, finding sources, building community, collaborating on coverage, and discovering emerging issues or trends. Learn how to sign up, log on and start posting “tweets” to enhance your hyperlocal coverage.
Top 10 Rules for Limiting Legal Risk
If you’re running a citizen media site or contributing to one, these 10 rules will help you avoid potential legal piftalls. Get advice in videos from Harvard Berkman Center experts and Media Law Resource Center attorneys. Module produced by Geanne Rosenberg, associate professor at City University of New York’s Graduate School of Journalism and Baruch College.
Read the press release from CUNY.
Tools for Citizen Journalists
This six-chapter training module will help site operators and citizen journalists cope with the challenges of covering communities on small budgets with little or no staff. Get tips on where to sniff out great ideas and turn them into a compelling story, how to use data to punch up your coverage, how to manage a site when you don’t have a staff to help out, who to consider for partnerships that might help move your site along, and how to tap into the knowledge and passion of your readers. Module developed by Wendell Cochran and Amy Eisman, American University School of Communication.
Journalism 2.0: How to Survive and Thrive
A guide to help professional and amateur news producers understand and implement digital tools to enhance their reporting. Written by Mark Briggs, assistant managing editor for interactive news at The News Tribune in Tacoma, Washington.
Twelve Tips for Growing Positive Communities Online
Your site is up and all is running well until the conversation heats up and a flame war erupts. Here are a dozen ways to keep the discussion going while maintaining a civil environment and positive direction on your site.
Make Internet TV
Make Internet TV is an easy to read multimedia manual for publishing internet video. It has step-by-step instructions for everything from choosing a camera to publishing and promoting videos on the internet.
Principles of Citizen Journalism
Whether writing a blog or involved in a full-scale hyperlocal news site, you are going to face a higher degree of skepticism than traditional media. That means fairness, accuracy, transparency and independence are tantamount to success. See what citizen media veterans say about those topics and other foundations of citizen journalism.
Training Citizen Journalists
In these seven case studies from around the United States, get a birds-eye view of citizen journalism today.
Using E-mail to Jumpstart your Newsgathering
Even professional journalists, pressed by 24/7 deadlines, are finding a way to help jump-start their reporting on breaking news stories and find excellent examples to illustrate more ambitious enterprise stories.
http://timetric.com/about/media-center/
http://www-958.ibm.com/software/data/cognos/manyeyes/
http://datahub.io/en/group/data-journalism
http://www.kobotoolbox.org/products/kobokit
http://opendatakit.org/participate/
low are the released and supported ODK projects.
-
Build - ODK Build enables users to generate forms using a drag-and-drop form designer. Build is implemented as an HTML5 web-based application and targets the common use case of a simple form.
-
Collect - ODK Collect is powerful phone-based replacement for your paper forms. Collect is built on the Android platform and can collect a variety of form data types: text, location, photos, video, audio, and barcodes.
-
Aggregate - ODK Aggregate provides a ready to deploy online repository to store, view and export collected data. Aggregate can run on Google’s reliable and free infrastructure as well as on local servers backed by MySQL and PostgreSQL.
-
Form Uploader - ODK Form Uploader easily upload a blank form and its media files to ODK Aggregate.
-
Briefcase - ODK Briefcase is the best way to transfer data from Collect and Aggregate.
-
Validate - ODK Validate ensures that you have a OpenRosa compliant form — one that will also work with all the ODK tools.
-
XLS2XForm - ODK XLS2XForm allow XForms to be designed with Excel.
http://datadrivenjournalism.net/resources/Meet_data_mapping_platform_CartoDB#When:18:19:19Z
fusion table vs cartodb and species sphere uses d3 and cartodb api
Tutorials for the implementation and use of freeDive can be found on the KDMC webpage and source code is available on GitHub. Disappointingly some of the links on the project website lead to blank pages, as is the case with the FAQ page, but such aspects will hopefully be improved in later stages of the project.
http://data.worldbank.org/developers
Syllabus
Posted on September 13, 2012
Aims of the course
The aim of the course is to familiarise students with current areas of research and development within computer science that have a direct relevance to the field of journalism, so that they are capable of participating in the design of future public information systems.
The course is built around a “design” frame that examines technology from the point of view of its possible applications and social context. It will familiarize the students with both the major unsolved problems of internet-era journalism, and the major areas of research within computer science that are being brought to bear on these problems. The scope is wide enough to include both relatively traditional journalistic work, such as computer-assisted investigative reporting, and the broader information systems that we all use every day to inform ourselves, such as search engines. The course will provide students with a thorough understanding of how particular fields of computational research relate to products being developed for journalism, and provoke ideas for their own research and projects.
Research-level computer science material will be discussed in class, but the emphasis will be on understanding the capabilities and limitations of this technology. Students with a CS background will have opportunity for algorithmic exploration and innovation, however the primary goal of the course is thoughtful, application-centered research and design.
Assignments will be completed in groups and involve experimentation with fundamental computational techniques. There will be some light coding, but the emphasis will be on thoughtful and critical analysis.
Format of the class, grading and assignments.
It is a fourteen week course for Masters’ students which has both a six point and a three point version. The six point version is designed for dual degree candidates in the journalism and computer science concentration, while the three point version is designed for those cross listing from other concentrations and schools.
The class is conducted in a seminar format. Assigned readings and computational techniques will form the basis of class discussion. Throughout the semester we will be inviting guest speakers with expertise in the relevant areas to talk about their related research and product development
The output of the course for a 6pt candidate will be one research assignment in the form of a 25-page research paper. The three point course will require a shorter research paper, and both versions of the course will also have approximately bi-weekly written assignmenst which will frequently involve experimentation with computational techniques. For those in the dual degree program or who have strong technical skills, there is an option to produce a prototype as part of the final assignment. The class is conducted on pass/fail basis for grading, in line with the journalism school’s grading system.
Week 1. – Basics
We set out the expectations of the course, and frame our work as the task of designing of public information production and distribution systems. Computer science techniques can help in four different areas: data-driven reporting, story presentation, information filtering, and effect tracking. The recommended readings are aiming to to give you an understanding of the landscape of technical disruption in the news industry, and the ways in which computer science techniques can help to build something better.
Required
-
What should the digital public sphere do?, Jonathan Stray
-
Dilemmas in a General Theory of Planning, Horst Rittel, Melvin Weber
Recommended
-
Newspapers and thinking the Unthinkable, Clay Shirky
-
Computational Journalism, Cohen, Turner, Hamilton,
-
Precision Journalism, Ch.1, Journalism and the Scientific Tradition, Philip Meyer
Viewed in class
-
The Jobless rate for People Like You, New York Times
-
Dollars for Docs, ProPublica
-
What did private security contractors do in Iraq and document mining methodology, Jonathan Stray
-
The network of global corporate control, Vitali et. al.
-
World Government Data, The Guardian UK
Weeks 2-3: Technical fundamentals
We’ll spend the next couple weeks examining the techniques that will form the basis of much of the rest of our work in the course: clustering and the document vector space model.
Week 2: Clustering
A vector of numbers is a fundamental data representation which forms the basis of very many algorithms in data mining, language processing, machine learning, and visualization. This week we will explore two things: representing objects as vectors, and clustering them, which might be the most basic thing you can do with this sort of data. This requires a distance metric and a clustering algorithm — both of which involve editorial choices! In journalism we can use clusters to find groups of similar documents, analyze how politicians vote together, or automatically detect groups of crimes.
Required
-
Cluster Analysis, Wikipedia
-
General purpose computer-assisted clustering and conceptualization, Justin Grimmer, Gary King
Recommended
-
‘GOP 5′ make strange bedfellows in budget fight, Chase Davis, California Watch
-
The Challenges of Clustering High Dimensional Data, Steinbach, Ertöz, Kumar
-
Survey of clustering data mining techniques, Pavel Berkhin
Viewed in class
-
Message Machine, ProPublica
-
Data mining in politics, Aleks Jakulin
-
A House Divided, Delaware Online
Assignment: you must choose your groups of 2-3 students, and pick a data set to work with for the rest of the course. Due next week.
Week 3: Document topic modelling
The text processing algorithms we will discuss this week are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works.
Required
-
Online Natural Language Processing Course, Stanford University
-
Week 7: Information Retrieval, Term-Document Incidence Matrix
-
Week 7: Ranked Information Retrieval, Introducing Ranked Retrieval
-
Week 7: Ranked Information Retrieval, Term Frequency Weighting
-
Week 7: Ranked Information Retrieval, Inverse Document Frequency Weighting
-
Week 7: Ranked Information Retrieval, TF-IDF weighting
-
-
Probabilistic Topic Models, David M. Blei
Recommended:
-
A full-text visualization of the Iraq war logs, Jonathan Stray
-
Latent Semantic Analysis, Peter Wiemer-Hastings
-
Introduction to Information Retrieval Chapter 6, Scoring, Term Weighting, and The Vector Space Model, Manning, Raghavan, and Schütze.
Assignment – due in three weeks
You will perform document clustering with the gensim Python library, and analyze the results.
-
Choose a document set. You can use the Reuters corpus if you like but you are encouraged to try other sources.
-
Import the documents and score them in TF-IDF form. Then query the document set by retrieving the top ten closest documents (as ranked by cosine distance) for a variety different queries. Choose three different queries that show interesting strengths and weaknesses of this approach, and write analysis of the results.
-
Choose a topic modelling method (such as connected components, LSA, or LDA) and cluster your documents. Hand in the extracted topics and comment on the results.
-
Choose a clustering method (such as k-means) and cluster the documents based on the extracted topics. How do the resulting clusters compare to how a human might categorize the documents
Weeks 4-5: Filtering
Over the next few weeks we will explore various types of collaborative filters: social, algorithmic, hybrid classic correlation-based filtering algorithms (“users who bought X also bought Y”, Netflix Prize) location- and context-based filtering. Our study will include the technical fundamentals of clustering and recommendation algorithms.
Week 4: Information overload and algorithmic filtering
This week we begin our study of filtering with some basic ideas about its role in journalism. Then we shift gears to pure algorithmic approaches to filtering, with a look at how the Newsblaster system works (similar to Google News.)
Required
-
Who should see what when? Three design principles for personalized newsJonathan Stray
-
Tracking and summarizing news on a daily basis with Columbia Newsblaster, McKeown et al
Recommended
-
Guess what? Automated news doesn’t quite work, Gabe Rivera
-
The Hermeneutics of Screwing Around, or What You Do With a Million Books, Stephen Ramsay
-
Can an algorithm be wrong?, Tarleton Gillespie
-
The Netflix Prize, Wikipedia
Week 5: Social software and social filtering
We have now studied purely algorithmic modes of filtering, and this week we will bring in the social. First we’ll look at the entire concept of “social software,” which is a new interdisciplinary field with its own dynamics. We’ll use the metaphor of “architecture,” suggested by Joel Spolsky, to think about how software influences behaviour. Then we’ll study social media and its role in journalism, including its role in information distribution and collection, and emerging techniques to help find sources.
Required
-
A Group is its own worst enemy, Clay Shirky
-
What’s the point of social news?, Jonathan Stray
-
Finding and Assessing Social Information Sources in the Context of Journalism, Nick Diakopolous et al.
Recommended
-
Learning from Stackoverflow, first fifteen minutes, Joel Spolsky
-
Norms, Laws, and Code, Jonathan Stray
-
What is Twitter, a Social Network or a News Media?, Haewoon Kwak, et al,
-
International reporting in the age of participatory media, Ethan Zuckerman
-
We The Media. Introduction and Chapter 1, Dan Gillmor,
-
Are we stuck in filter bubbles? Here are five potential paths out, Jonathan Stray
Week 6: Hybrid filters, recommendation, and conversation
We have now studied purely algorithmic and mostly social modes of filtering. This week we’re going to study systems that combine software and people. We’ll a look “recommendation” systems and the socially-driven algorithms behind them. Then we’ll turn to online discussions, and hybrid techniques for ensuring a “good conversation” — a social outcome with no single definition. We’ll finish by looking at an example of using human preferences to drive machine learning algorithms: Google Web search.
Required
-
Item-Based Collaborative Filtering Recommendation Algorithms, Sarwar et. al
-
How Reddit Ranking Algorithms Work, Amir Salihefendic
Recommended
-
Google News Personalization: Scalable Online Collaborative Filtering, Das et al
-
Slashdot Moderation, Rob Malda
-
Pay attention to what Nick Denton is doing with comments, Clay Shirky
-
How does Google use human raters in web search?, Matt Cutts
Assignment – due in two weeks:
Design a filtering algorithm for Facebook status updates. The filtering function will be of the form(status update, user data) => boolean. That is, given all previously collected user data and a new status update from a friend, you must decide whether or not to show the new update in the user’s news feed. Turn in a design document with the following items:
-
List all available information that Facebook has about you. Include a description of how this information is collected or changes over time.
-
Argue for the factors that you would like to influence the filtering, both in terms of properties that are desirable to the user and properties that are desirable socially. Specify as concretely as possible how each of these (probably conflicting) goals might be implemented in code.
-
Write psuedo-code for the filter function. It does not need to be executable and may omit details, however it must be specific enough that a competent programmer can turn it into working code in an obvious way.
Weeks 7-9: Knowledge mining
Week 7: Visualization
An introduction into how visualisation helps people interpret information. The difference between infographics and visualization, and between exploration and presentation. Design principles from user experience considerations, graphic design, and the study of the human visual system. Also, what is specific about visualization in journalism, as opposed to the many other fields that use it?
Required
-
Designing Data Visualizations, Noah Illinsky and Julie Steele, OReilly
-
Computational Information Design chapters 1 and 2, Ben Fry
Recommended
-
Journalism in an age of data, Geoff McGhee
-
Visualization Rhetoric: Framing Effects in Narrative Visualization, Hullman and Diakopolous
-
Visualization, Tamara Munzner
Week 8: Structured journalism and knowledge representation
Is journalism in the text/video/audio business, or is it in the knowledge business? This week we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at th Reverb algorithm.
Required
-
A fundamental way newspaper websites need to change, Adrian Holovaty
-
The next web of open, linked data – Tim Berners-Lee TED talk
-
Identifying Relations for Open Information Extraction, Fader, Soderland, and Etzioni (Reverb algorithm)
Recommended
-
What the semantic web can represent – Tim Berners-Lee
-
Can an algorithm write a better story than a reporter? Wired/ 2012.
Assignment: Use Reverb to extract propositions from a subset of your data set (if applicable, otherwise the Reuters corpus). Analyze the results. What types of propositions are extracted? What types of propositions are not? Does it depend on the wording of the original text? What mistakes does Reverb make? What is the error rate? Are there different error rates for different types of statements, sources, or other categories?
Week 9: Network analysis
add intelligence examples?
Network analysis (aka social network analysis, link analysis) is a promising and popular technique for uncovering relationships between diverse individuals and organizations. It is widely used in intelligence and law enforcement, but not so much in journalism. We’ll look at basic techniques and algorithms and try to understand the promise — and the many practical problems.
Required
-
Sections I and II of Community Detection in Graphs, Fortunato
-
Centrality and Network Flow, Borgatti
Recommended
-
Visualizing Communities, Jonathan Stray
-
The network of global corporate control, Vitali et. al.
-
The Dynamics of Protest Recruitment through an Online Network, Sandra González-Bailón, et al.
-
Exploring Enron, Jeffrey Heer
Examples of journalistic network analysis
-
Galleon’s Web, Wall Street Journal
-
Who Runs Hong Kong?, South China Morning Post
Week 10: Drawing conclusions from data
You’ve loaded up all the data. You’ve run the algorithms. You’ve completed your analysis. But how do you know that you are right? It’s incredibly easy to fool yourself, but fortunately, there is a long history of fields grappling with the problem of determining truth in the face of uncertainty, from statistics to intelligence analysis.
Required
-
basic stats concepts?
-
Correlation and causation, Business Insider
-
The Psychology of Intelligence Analysis, chapters 1,2,3 and 8. Richards J. Heuer
Recommended
-
If correlation doesn’t imply causation, then what does?, Michael Nielsen
-
Graphical Inference for Infovis, Hadley Wickham et al.
-
Why most published research findings are false, John P. A. Ioannidis
Week 11: Security, Surveillance, and Censorship
intro to crypto?
‘On the internet everyone knows you are a dog’. Both in commercial and editorial terms the issues of online privacy, identity and surveillance and important for journalism. Who is watching our online works? How do you protect a source in the 21st Century? Who gets to access to all of this mass intelligence, and what does the ability to survey everything all the time mean both practically and ethically for journalism?
Required
-
Chris Soghoian, Why secrets aren’t safe with journalists, New York times 2011
-
Hearst New Media Lecture 2012, Rebecca MacKinnon
Recommended
-
CPJ journalist security guide section 3, Information Security
-
Global Internet Filtering Map, Open Net Initiative
-
The NSA is building the country’s biggest spy center, James Banford, Wired
Cryptographic security
Anonymity
-
Who is harmed by a real-names policy, Geek Feminism
Assignment: Come up with situation in which a source and a journalist need to collaborate to keep a secret. Describe in detail:
-
The threat model. What are the risks?
-
The adversary model. Who must the information be kept secret from? What are their capabilities, interests, and costs?
-
A plan to keep the information secure, including tools and practices
-
An evaluation of the costs, possible sources of failure, and remaining risks
Week 12: Tracking flow and impact
How does information flow in the online ecosystem? What happens to a story after it’s published? How do items spread through social networks? We’re just beginning to be able to track ideas as they move through the network, by combining techniques from social network analysis and bioinformatics.
Required
-
Metrics, Metrics everywhere: Can we measure the impact of journalism?, Jonathan Stray
-
Meme-tracking and the Dynamics of the News Cycle, Leskovec et al.
-
The role of social networks in information diffusion, Eytan Bakshy et al.
Recommended
-
Defining Moments in Risk Communication Research: 1996–2005, Katherine McComas
-
Chain Letters and Evolutionary Histories, Charles H. Bennett, Ming Li and Bin Ma
-
Competition among memes in a world with limited attention, Weng et al.
-
Zach Seward, In the news cycle, memes spread more like a heartbeat than a virus
-
How hidden networks orchestrated Stop Kony 2012, Gilad Lotan
Week 13 – Project review
We will spend this week discussing your final projects and figuring out the best approaches to your data and/or topic.
Week 14
Review of course. Invited guest panel of computer scientists working both within journalism and in related fields concerned with public information, discuss their priorities and answer questions about what their priorities.
Source:http://www.compjournalism.com/
http://www.votewatch.eu/
http://www.djangobook.com/en/2.0/index.html
Bar, line and pie charts
Planning and management charts
Meters and gauges charts
Other types of charts
source: http://www.rgraph.net/examples/index.html
e broke the team’s strategy down in to a few key objectives, the four main ones being:
Provide context
Describe processes
Reveal patterns
Explain the geography
Here is some of what Ericson told the audience and some of the examples he gave during the session, broken down under the different headers.
Provide context
Graphics should bring something new to the story, not just repeat the information in the lede.
Ericson emphasised a graphics team that simply illustrates what the reporter has already told the audience is not doing its job properly. “A graphic can bring together a variety of stories and provide context,” he said, citing his team’s work on the Fukushima nuclear crisis.
We would have reporters with information about the health risks, and some who were working on radiation levels, and then population, and we can bring these things together with graphics and show the context.
Describe processes
The Fukushima nuclear crisis has spurned a lot of graphics work at news organisations across thew world, and Ericson showed a few different examples of work on the situation to the #ijf11 audience. Another graphic demonstrated the process of a nuclear meltdown, and what exactly was happening at the Fukushima plant.
As we approach stories, we are not interested in a graphic showing how a standard nuclear reactor works, we want to show what is particular to a situation and what will help a reader understand this particular new story.
Like saying: “You’ve been reading about these fuel rods all over the news, this is what they actually look like and how they work”.
From nuclear meltdown to dancing. A very different graphic under the ‘desribe processes’ umbrella neatly demonstrated that graphics work is not just for mapping and data.
Disecting a Dance broke down a signature piece by US choreographer Merce Cunningham in order to explain his style.
The NYT dance critic narrated the video, over which simple outlines were overlaid at stages to demonstrate what he was saying. See the full video at this link.
Reveal patterns
This is perhaps the objective most associated with data visualisation, taking a dataset and revealing the patterns that may tell us a story: crime is going up here, population density down there, immigration changing over time, etc.
Ericson showed some of the NYT’s work on voting and immigration patterns, but more interesting was a “narrative graphic” that charted the geothermal changes in the bedrock under California created by attempts to exploit energy in hot areas of rock, which can cause earthquakes.
These so-called narrative graphics are take what we think of as visualisation close to what we have been seeing for a while in broadcast news bulletins.
Explain geography
The final main objective was to show the audience the geographical element of stories.
Examples for this section included mapping the flooding of New Orleans following hurricane Katrina, including showing what parts of the region were below sea level and overlaying population density, showing where levies had broken and showing what parts of the land were underwater.
Geography was also a feature of demonstrating the size and position of the oil slick in the Gulf following the BP Deepwater Horizon accident, and comparing it with previous major oil spills.
Some of the tools in use by the NYT team, with examples:
- Google Fusion Tables
- Tableau Public: Power Hitters
- Google Charts from New York State Test Scores – The New York Times
- HTML, CSS and Javascript: 2010 World Cup Rankings
- jQuery: The Write Less, Do More, JavaScript Library
- jQuery UI – Home
- Protovis
- Raphaël—JavaScript Library
- The R Project for Statistical Computing
- Processing.org
An important formula
Data + story > data
- It doesn’t take a skilled mathematician to work that one out. But don’t be fooled by it’s simplicity, it underpinned a key message to take away from the workshop. The message is equally simple: graphics and data teams have the skill to make sense of data for their audience, and throwing a ton of data online without adding analysis and extracting a story is not the right way to go about it.
- http://www.simile-widgets.org/exhibit3/
http://jdownloader.org/download/index
http://www.documentcloud.org/home
http://panda.readthedocs.org/en/latest/index.html
http://csvkit.readthedocs.org/en/latest/tutorial/getting_started.html
http://betterexplained.com/articles/a-visual-guide-to-version-control/
http://lifeandcode.tumblr.com/page/39
Resources
Resources: Blog Roll
November 13th, 2012 | by EJC
A list of useful and informative blogs from media organisations, emergency management professionals and networks, web 2.0 and crisis mappers
Resources
Useful Links: Verification Tools
October 16th, 2012 | by EJC
A list of expert-recommended verification tools to assess the reliability of sources and the authenticity of user-generated content
Resources
Useful Links: Official Data Sources
October 4th, 2012 | by EJC
A list of useful links to datasets from government agencies, international organisations and research centres.
Resources
Useful Links: Curation Platforms
September 4th, 2012 | by EJC
A list of useful curation platforms and curated news outlets
Resources
Useful Links: Mapping and Crowdsourcing Networks
August 30th, 2012 | by EJC
A list of useful mapping and crowdsourcing tools and networks that centre on emergency situation and humanitarian response.
Resources
Useful Links: Conflict Reporting
August 29th, 2012 | by EJC
A list of useful resources and organisations particularly focusing on conflict reporting.
Resources
Useful Links: Disaster Reporting
August 28th, 2012 | by EJC
A list of useful resources and organisations particularly focusing on disaster reporting.
http://emergencyjournalism.net/category/resources/
http://kartograph.org/
Learn-to-Program Resources
Javascript
JQuery
Python
-
learnpython.org – A site that lets you do exercises online, much like Codecademy, except for Python.
-
The Django Book - Free e-book with tutorials to help you learn Django, a web framework for Python.
Ruby
-
TryRuby.org - A site that lets you do exercises online, much like Codecademy, except for Ruby.
PHP
Perl
Erlang
Processing
Tutorial Sites
-
Tutorialzine - Excellent video/text tutorials on a variety of web development topics.
-
The New Boston - Free educational video tutorials by Bucky Roberts.
Free Online Computer Science Courses
-
Intro to Computer Science, Harvard - Videos of all lectures and notes are available for free online as part of the Open Courseware movement. Very engaging instructor.
-
EdX is Harvard’s new online education portal, which now has many courses. Stanford Engineering Everywhere offers full video of lectures of many computer science classes, along with downloadable study materials. Free.
-
Intro to Computer Science, MIT - Full video and course materials via MIT Open Courseware.
-
Google Code University - “This site provides sample course content and tutorials for Computer Science (CS) students and educators on current computing technologies and paradigms.”
-
Lecturefox has computer science lectures and courses available for free online from many universities.
-
Programming Concepts, A Tutorial for Novice Programmers - CUNY.
-
Programming Literacy - A comprehensive introduction to programming.
-
Khan Academy says it wants to “make a world class education available to everyone for free.” They have many courses, from K-12 to college level courses, with video lectures, exercises, and more. Here are their <

































Yes yes yes. After using it for a few weeks I can’t believe I forgot to mention R + lattice. Its tremendously powerful. +1