Add QtGraphs demo with CSV-reader

The demo shows how to use an external third-party csv library with
QtGraphs.

Pick-to: 6.10
Task-number: QTBUG-122326
Change-Id: I08dc6c74953fde9dd35c9bf8540ea58234e567f1
Reviewed-by: Axel Spoerl <axel.spoerl@qt.io>
This commit is contained in:
Sami Varanka 2025-02-25 12:00:41 +02:00
parent e0781dc6e4
commit 6b47b0356d
52 changed files with 19197 additions and 0 deletions

7
LICENSES/BSL-1.0.txt Normal file
View File

@ -0,0 +1,7 @@
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:
The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

121
LICENSES/CC0-1.0.txt Normal file
View File

@ -0,0 +1,121 @@
Creative Commons Legal Code
CC0 1.0 Universal
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
HEREUNDER.
Statement of Purpose
The laws of most jurisdictions throughout the world automatically confer
exclusive Copyright and Related Rights (defined below) upon the creator
and subsequent owner(s) (each and all, an "owner") of an original work of
authorship and/or a database (each, a "Work").
Certain owners wish to permanently relinquish those rights to a Work for
the purpose of contributing to a commons of creative, cultural and
scientific works ("Commons") that the public can reliably and without fear
of later claims of infringement build upon, modify, incorporate in other
works, reuse and redistribute as freely as possible in any form whatsoever
and for any purposes, including without limitation commercial purposes.
These owners may contribute to the Commons to promote the ideal of a free
culture and the further production of creative, cultural and scientific
works, or to gain reputation or greater distribution for their Work in
part through the use and efforts of others.
For these and/or other purposes and motivations, and without any
expectation of additional consideration or compensation, the person
associating CC0 with a Work (the "Affirmer"), to the extent that he or she
is an owner of Copyright and Related Rights in the Work, voluntarily
elects to apply CC0 to the Work and publicly distribute the Work under its
terms, with knowledge of his or her Copyright and Related Rights in the
Work and the meaning and intended legal effect of CC0 on those rights.
1. Copyright and Related Rights. A Work made available under CC0 may be
protected by copyright and related or neighboring rights ("Copyright and
Related Rights"). Copyright and Related Rights include, but are not
limited to, the following:
i. the right to reproduce, adapt, distribute, perform, display,
communicate, and translate a Work;
ii. moral rights retained by the original author(s) and/or performer(s);
iii. publicity and privacy rights pertaining to a person's image or
likeness depicted in a Work;
iv. rights protecting against unfair competition in regards to a Work,
subject to the limitations in paragraph 4(a), below;
v. rights protecting the extraction, dissemination, use and reuse of data
in a Work;
vi. database rights (such as those arising under Directive 96/9/EC of the
European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, and under any national implementation
thereof, including any amended or successor version of such
directive); and
vii. other similar, equivalent or corresponding rights throughout the
world based on applicable law or treaty, and any national
implementations thereof.
2. Waiver. To the greatest extent permitted by, but not in contravention
of, applicable law, Affirmer hereby overtly, fully, permanently,
irrevocably and unconditionally waives, abandons, and surrenders all of
Affirmer's Copyright and Related Rights and associated claims and causes
of action, whether now known or unknown (including existing as well as
future claims and causes of action), in the Work (i) in all territories
worldwide, (ii) for the maximum duration provided by applicable law or
treaty (including future time extensions), (iii) in any current or future
medium and for any number of copies, and (iv) for any purpose whatsoever,
including without limitation commercial, advertising or promotional
purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
member of the public at large and to the detriment of Affirmer's heirs and
successors, fully intending that such Waiver shall not be subject to
revocation, rescission, cancellation, termination, or any other legal or
equitable action to disrupt the quiet enjoyment of the Work by the public
as contemplated by Affirmer's express Statement of Purpose.
3. Public License Fallback. Should any part of the Waiver for any reason
be judged legally invalid or ineffective under applicable law, then the
Waiver shall be preserved to the maximum extent permitted taking into
account Affirmer's express Statement of Purpose. In addition, to the
extent the Waiver is so judged Affirmer hereby grants to each affected
person a royalty-free, non transferable, non sublicensable, non exclusive,
irrevocable and unconditional license to exercise Affirmer's Copyright and
Related Rights in the Work (i) in all territories worldwide, (ii) for the
maximum duration provided by applicable law or treaty (including future
time extensions), (iii) in any current or future medium and for any number
of copies, and (iv) for any purpose whatsoever, including without
limitation commercial, advertising or promotional purposes (the
"License"). The License shall be deemed effective as of the date CC0 was
applied by Affirmer to the Work. Should any part of the License for any
reason be judged legally invalid or ineffective under applicable law, such
partial invalidity or ineffectiveness shall not invalidate the remainder
of the License, and in such case Affirmer hereby affirms that he or she
will not (i) exercise any of his or her remaining Copyright and Related
Rights in the Work or (ii) assert any associated claims and causes of
action with respect to the Work, in either case contrary to Affirmer's
express Statement of Purpose.
4. Limitations and Disclaimers.
a. No trademark or patent rights held by Affirmer are waived, abandoned,
surrendered, licensed or otherwise affected by this document.
b. Affirmer offers the Work as-is and makes no representations or
warranties of any kind concerning the Work, express, implied,
statutory or otherwise, including without limitation warranties of
title, merchantability, fitness for a particular purpose, non
infringement, or the absence of latent or other defects, accuracy, or
the present or absence of errors, whether or not discoverable, all to
the greatest extent permissible under applicable law.
c. Affirmer disclaims responsibility for clearing rights of other persons
that may apply to the Work or any use thereof, including without
limitation any person's Copyright and Related Rights in the Work.
Further, Affirmer disclaims responsibility for obtaining any necessary
consents, permissions or other rights required for any use of the
Work.
d. Affirmer understands and acknowledges that Creative Commons is not a
party to this document and has no duty or obligation with respect to
this CC0 or use of the Work.

18
LICENSES/MIT.txt Normal file
View File

@ -0,0 +1,18 @@
MIT License
Copyright (c) <year> <copyright holders>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -8,6 +8,7 @@ if(TARGET Qt6::Quick)
endif()
if(TARGET Qt6::Quick AND TARGET Qt6::Graphs)
qt_internal_add_example(stocqt)
qt_internal_add_example(graphs_csv)
endif()
if(TARGET Qt6::Quick AND TARGET Qt6::QuickControls2)
qt_internal_add_example(colorpaletteclient)

View File

@ -0,0 +1 @@
add_subdirectory(csv-parser)

View File

@ -0,0 +1,32 @@
version = 1
[[annotations]]
# Apply to everything in the csv-parser repo
path = ["csv-parser/**"]
comment = "Vince's CSV Parser library"
precedence = "closest"
SPDX-FileCopyrightText = "Copyright (c) 2017-2019 Vincent La"
SPDX-License-Identifier = "MIT"
[[annotations]]
path = ["csv-parser/include/external/hedley.h"]
comment = "Hedley - public domain (CC0)"
SPDX-License-Identifier = "CC0-1.0"
[[annotations]]
path = ["csv-parser/include/external/mio.hpp"]
comment = "Mio - memory mapped IO"
SPDX-FileCopyrightText = "Copyright 2017 https://github.com/mandreyel"
SPDX-License-Identifier = "MIT"
[[annotations]]
path = ["csv-parser/include/external/string_view.hpp"]
comment = "String View (BSL)"
SPDX-FileCopyrightText = "Copyright 2017-2019 by Martin Moene"
SPDX-License-Identifier = "BSL-1.0"
[[annotations]]
path = ["csv-parser/include/internal/csv_row_json.cpp"]
comment = "JSON serialization adapted from JSON for Modern C++"
SPDX-FileCopyrightText = "Copyright © 2013-2015 Niels Lohmann."
SPDX-License-Identifier = "MIT"

View File

@ -0,0 +1,62 @@
cmake_minimum_required(VERSION 3.9)
project(csv)
if(CSV_CXX_STANDARD)
set(CMAKE_CXX_STANDARD ${CSV_CXX_STANDARD})
else()
set(CMAKE_CXX_STANDARD 17)
endif(CSV_CXX_STANDARD)
message("Building CSV library using C++${CMAKE_CXX_STANDARD}")
# Defines CSV_HAS_CXX17 in compatibility.hpp
if (CMAKE_VERSION VERSION_LESS "3.12.0")
add_definitions(-DCMAKE_CXX_STANDARD=${CMAKE_CXX_STANDARD})
else()
add_compile_definitions(CMAKE_CXX_STANDARD=${CMAKE_CXX_STANDARD})
endif()
set(THREADS_PREFER_PTHREAD_FLAG TRUE)
find_package(Threads QUIET REQUIRED)
if(MSVC)
# Make Visual Studio report accurate C++ version
# See: https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
# /Wall emits warnings about the C++ standard library
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /EHsc /GS- /Zc:__cplusplus /W4")
else()
# Ignore Visual Studio pragma regions
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")
# set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} --coverage -Og")
endif(MSVC)
set(CSV_ROOT_DIR ${CMAKE_CURRENT_LIST_DIR})
set(CSV_BUILD_DIR ${CMAKE_CURRENT_BINARY_DIR})
set(CSV_INCLUDE_DIR ${CMAKE_CURRENT_LIST_DIR}/include/)
set(CSV_SOURCE_DIR ${CSV_INCLUDE_DIR}/internal/)
set(CSV_TEST_DIR ${CMAKE_CURRENT_LIST_DIR}/tests)
include_directories(${CSV_INCLUDE_DIR})
## Load developer specific CMake settings
if (CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
SET(CSV_DEVELOPER TRUE)
endif()
## Main Library
add_subdirectory(${CSV_SOURCE_DIR})
## Developer settings
if (CSV_DEVELOPER)
# Allow for performance profiling
if (MSVC)
target_link_options(csv PUBLIC /PROFILE)
endif()
# More error messages.
if (UNIX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} \
-Wall -Wextra -Wsign-compare \
-Wwrite-strings -Wpointer-arith -Winit-self \
-Wconversion -Wno-sign-conversion")
endif()
endif()

View File

@ -0,0 +1,30 @@
{
"configurations": [
{
"name": "x64-Release",
"generator": "Ninja",
"configurationType": "RelWithDebInfo",
"inheritEnvironments": [
"msvc_x64_x64"
],
"buildRoot": "${projectDir}\\build\\${name}",
"installRoot": "${projectDir}\\install\\${name}",
"cmakeCommandArgs": "",
"buildCommandArgs": "-v",
"ctestCommandArgs": ""
},
{
"name": "x64-Debug",
"generator": "Ninja",
"configurationType": "Debug",
"inheritEnvironments": [
"msvc_x64_x64"
],
"buildRoot": "${projectDir}\\build\\${name}",
"installRoot": "{projectDir}\\install\\${name}",
"cmakeCommandArgs": "",
"buildCommandArgs": "-v",
"ctestCommandArgs": ""
}
]
}

View File

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2017-2019 Vincent La
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -0,0 +1,96 @@
# Makefile used for building/testing on Travis CI
# Force Travis to use updated compilers
ifeq ($(TRAVIS_COMPILER), gcc)
CXX = g++-8
else ifeq ($(TRAVIS_COMPILER), clang)
CXX = clang++
endif
ifeq ($(STD), )
STD = c++11
endif
BUILD_DIR = build
SOURCE_DIR = include
SINGLE_INCLUDE_DIR = single_include
TEST_DIR = tests
CFLAGS = -pthread -std=$(STD)
TEST_OFLAGS =
ifeq ($(CXX), g++-8)
TEST_OFLAGS = -Og
endif
TEST_FLAGS = -Itests/ $(CFLAGS) $(TEST_OFLAGS) -g --coverage -Wno-unknown-pragmas -Wall
# Main Library
SOURCES = $(wildcard include/internal/*.cpp)
OBJECTS = $(subst .cpp,.o,$(subst src/,$(BUILD_DIR)/,$(SOURCES)))
TEST_SOURCES = $(wildcard tests/*.cpp)
TEST_SOURCES_NO_EXT = $(subst tests/,,$(subst .cpp,,$(TEST_SOURCES)))
all: csv_parser test_all clean distclean
################
# Main Library #
################
csv:
$(CXX) -c -O3 $(CFLAGS) $(SOURCES)
mkdir -p $(BUILD_DIR)
mv *.o $(BUILD_DIR)
libcsv.a:
make csv
ar rvs libcsv.a $(wildcard build/*.o)
docs:
doxygen Doxyfile
############
# Programs #
############
csv_stats:
$(CXX) -o csv_stats -O3 $(CFLAGS) programs/csv_stats.cpp -I$(SINGLE_INCLUDE_DIR)
#########
# Tests #
#########
csv_test:
$(CXX) -o csv_test $(SOURCES) $(TEST_SOURCES) -I${SOURCE_DIR} $(TEST_FLAGS)
run_csv_test: csv_test
mkdir -p tests/temp
./csv_test
# Test Clean-Up
rm -rf $(TEST_DIR)/temp
# Run code coverage analysis
code_cov: csv_test
mkdir -p test_results
mv *.gcno *.gcda $(PWD)/test_results
gcov-8 $(SOURCES) -o test_results --relative-only
mv *.gcov test_results
# Generate report
code_cov_report:
cd test_results
lcov --capture --directory test_results --output-file coverage.info
genhtml coverage.info --output-directory out
valgrind: csv_stats
# Can't run valgrind against csv_test because it mangles the working directory
# which causes csv_test to not be able to find test files
valgrind --leak-check=full ./csv_stats $(TEST_DIR)/data/real_data/2016_Gaz_place_national.txt
.PHONY: all clean distclean
clean:
rm -f build/*
rm -f *.gc*
rm -f libcsv.a
rm -f csv_*
distclean: clean

View File

@ -0,0 +1,376 @@
# Vince's CSV Parser
[![CMake on Windows](https://github.com/vincentlaucsb/csv-parser/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/vincentlaucsb/csv-parser/actions/workflows/cmake-multi-platform.yml)
* [Motivation](#motivation)
* [Documentation](#documentation)
* [Integration](#integration)
* [C++ Version](#c-version)
* [Single Header](#single-header)
* [CMake Instructions](#cmake-instructions)
* [Features & Examples](#features--examples)
* [Reading an Arbitrarily Large File (with Iterators)](#reading-an-arbitrarily-large-file-with-iterators)
* [Memory Mapped Files vs. Streams](#memory-mapped-files-vs-streams)
* [Indexing by Column Names](#indexing-by-column-names)
* [Numeric Conversions](#numeric-conversions)
* [Specifying the CSV Format](#specifying-the-csv-format)
* [Trimming Whitespace](#trimming-whitespace)
* [Handling Variable Numbers of Columns](#handling-variable-numbers-of-columns)
* [Setting Column Names](#setting-column-names)
* [Converting to JSON](#converting-to-json)
* [Parsing an In-Memory String](#parsing-an-in-memory-string)
* [Writing CSV Files](#writing-csv-files)
* [Contributing](#contributing)
## Motivation
There's plenty of other CSV parsers in the wild, but I had a hard time finding what I wanted. Inspired by Python's `csv` module, I wanted a library with **simple, intuitive syntax**. Furthermore, I wanted support for special use cases such as calculating statistics on very large files. Thus, this library was created with these following goals in mind.
### Performance and Memory Requirements
A high performance CSV parser allows you to take advantage of the deluge of large datasets available. By using overlapped threads, memory mapped IO, and
minimal memory allocation, this parser can quickly tackle large CSV files--even if they are larger than RAM.
In fact, [according to Visual Studio's profier](https://github.com/vincentlaucsb/csv-parser/wiki/Microsoft-Visual-Studio-CPU-Profiling-Results) this
CSV parser **spends almost 90% of its CPU cycles actually reading your data** as opposed to getting hung up in hard disk I/O or pushing around memory.
#### Show me the numbers
On my computer (12th Gen Intel(R) Core(TM) i5-12400 @ 2.50 GHz/Western Digital Blue 5400RPM HDD), this parser can read
* the [69.9 MB 2015_StateDepartment.csv](https://github.com/vincentlaucsb/csv-data/tree/master/real_data) in 0.19 seconds (360 MBps)
* a [1.4 GB Craigslist Used Vehicles Dataset](https://www.kaggle.com/austinreese/craigslist-carstrucks-data/version/7) in 1.18 seconds (1.2 GBps)
* a [2.9GB Car Accidents Dataset](https://www.kaggle.com/sobhanmoosavi/us-accidents) in 8.49 seconds (352 MBps)
### Robust Yet Flexible
#### RFC 4180 and Beyond
This CSV parser is much more than a fancy string splitter, and parses all files following [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180.txt).
However, in reality we know that RFC 4180 is just a suggestion, and there's many "flavors" of CSV such as tab-delimited files. Thus, this library has:
* Automatic delimiter guessing
* Ability to ignore comments in leading rows and elsewhere
* Ability to handle rows of different lengths
* Ability to handle arbitrary line endings (as long as they are some combination of carriage return and newline)
By default, rows of variable length are silently ignored, although you may elect to keep them or throw an error.
#### Encoding
This CSV parser is encoding-agnostic and will handle ANSI and UTF-8 encoded files.
It does not try to decode UTF-8, except for detecting and stripping UTF-8 byte order marks.
### Well Tested
This CSV parser has an extensive test suite and is checked for memory safety with Valgrind. If you still manage to find a bug,
do not hesitate to report it.
## Documentation
In addition to the [Features & Examples](#features--examples) below, a [fully-fledged online documentation](https://vincela.com/csv/) contains more examples, details, interesting features, and instructions for less common use cases.
## Integration
This library was developed with Microsoft Visual Studio and is compatible with >g++ 7.5 and clang.
All of the code required to build this library, aside from the C++ standard library, is contained under `include/`.
### C++ Version
While C++17 is recommended, C++11 is the minimum version required. This library makes extensive use of string views, and uses
[Martin Moene's string view library](https://github.com/martinmoene/string-view-lite) if `std::string_view` is not available.
### Single Header
This library is available as a single `.hpp` file under [`single_include/csv.hpp`](single_include/csv.hpp).
### CMake Instructions
If you're including this in another CMake project, you can simply clone this repo into your project directory,
and add the following to your CMakeLists.txt:
```
# Optional: Defaults to C++ 17
# set(CSV_CXX_STANDARD 11)
add_subdirectory(csv-parser)
# ...
add_executable(<your program> ...)
target_link_libraries(<your program> csv)
```
#### Avoid cloning with FetchContent
Don't want to clone? No problem. There's also [a simple example documenting how to use CMake's FetchContent module to integrate this library](https://github.com/vincentlaucsb/csv-parser/wiki/Example:-Using-csv%E2%80%90parser-with-CMake-and-FetchContent).
## Features & Examples
### Reading an Arbitrarily Large File (with Iterators)
With this library, you can easily stream over a large file without reading its entirety into memory.
**C++ Style**
```cpp
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
for (CSVRow& row: reader) { // Input iterator
for (CSVField& field: row) {
// By default, get<>() produces a std::string.
// A more efficient get<string_view>() is also available, where the resulting
// string_view is valid as long as the parent CSVRow is alive
std::cout << field.get<>() << ...
}
}
...
```
**Old-Fashioned C Style Loop**
```cpp
...
CSVReader reader("very_big_file.csv");
CSVRow row;
while (reader.read_row(row)) {
// Do stuff with row here
}
...
```
#### Memory-Mapped Files vs. Streams
By default, passing in a file path string to the constructor of `CSVReader`
causes memory-mapped IO to be used. In general, this option is the most
performant.
However, `std::ifstream` may also be used as well as in-memory sources via `std::stringstream`.
**Note**: Currently CSV guessing only works for memory-mapped files. The CSV dialect
must be manually defined for other sources.
```cpp
CSVFormat format;
// custom formatting options go here
CSVReader mmap("some_file.csv", format);
std::ifstream infile("some_file.csv", std::ios::binary);
CSVReader ifstream_reader(infile, format);
std::stringstream my_csv;
CSVReader sstream_reader(my_csv, format);
```
### Indexing by Column Names
Retrieving values using a column name string is a cheap, constant time operation.
```cpp
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
double sum = 0;
for (auto& row: reader) {
// Note: Can also use index of column with [] operator
sum += row["Total Salary"].get<double>();
}
...
```
### Numeric Conversions
If your CSV has lots of numeric values, you can also have this parser (lazily)
convert them to the proper data type.
* Type checking is performed on conversions to prevent undefined behavior and integer overflow
* Negative numbers cannot be blindly converted to unsigned integer types
* `get<float>()`, `get<double>()`, and `get<long double>()` are capable of parsing numbers written in scientific notation.
* **Note:** Conversions to floating point types are not currently checked for loss of precision.
```cpp
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
for (auto& row: reader) {
if (row["timestamp"].is_int()) {
// Can use get<>() with any integer type, but negative
// numbers cannot be converted to unsigned types
row["timestamp"].get<int>();
// You can also attempt to parse hex values
int value;
if (row["hexValue"].try_parse_hex(value)) {
std::cout << "Hex value is " << value << std::endl;
}
// Non-imperial decimal numbers can be handled this way
long double decimalValue;
if (row["decimalNumber"].try_parse_decimal(decimalValue, ',')) {
std::cout << "Decimal value is " << decimalValue << std::endl;
}
// ..
}
}
```
### Converting to JSON
You can serialize individual rows as JSON objects, where the keys are column names, or as
JSON arrays (which don't contain column names). The outputted JSON contains properly escaped
strings with minimal whitespace and no quoting for numeric values. How these JSON fragments are
assembled into a larger JSON document is an exercise left for the user.
```cpp
# include <sstream>
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
std::stringstream my_json;
for (auto& row: reader) {
my_json << row.to_json() << std::endl;
my_json << row.to_json_array() << std::endl;
// You can pass in a vector of column names to
// slice or rearrange the outputted JSON
my_json << row.to_json({ "A", "B", "C" }) << std::endl;
my_json << row.to_json_array({ "C", "B", "A" }) << std::endl;
}
```
### Specifying the CSV Format
Although the CSV parser has a decent guessing mechanism, in some cases it is preferrable to specify the exact parameters of a file.
```cpp
# include "csv.hpp"
# include ...
using namespace csv;
CSVFormat format;
format.delimiter('\t')
.quote('~')
.header_row(2); // Header is on 3rd row (zero-indexed)
// .no_header(); // Parse CSVs without a header row
// .quote(false); // Turn off quoting
// Alternatively, we can use format.delimiter({ '\t', ',', ... })
// to tell the CSV guesser which delimiters to try out
CSVReader reader("wierd_csv_dialect.csv", format);
for (auto& row: reader) {
// Do stuff with rows here
}
```
#### Trimming Whitespace
This parser can efficiently trim off leading and trailing whitespace. Of course,
make sure you don't include your intended delimiter or newlines in the list of characters
to trim.
```cpp
CSVFormat format;
format.trim({ ' ', '\t' });
```
#### Handling Variable Numbers of Columns
Sometimes, the rows in a CSV are not all of the same length. Whether this was intentional or not,
this library is built to handle all use cases.
```cpp
CSVFormat format;
// Default: Silently ignoring rows with missing or extraneous columns
format.variable_columns(false); // Short-hand
format.variable_columns(VariableColumnPolicy::IGNORE_ROW);
// Case 2: Keeping variable-length rows
format.variable_columns(true); // Short-hand
format.variable_columns(VariableColumnPolicy::KEEP);
// Case 3: Throwing an error if variable-length rows are encountered
format.variable_columns(VariableColumnPolicy::THROW);
```
#### Setting Column Names
If a CSV file does not have column names, you can specify your own:
```cpp
std::vector<std::string> col_names = { ... };
CSVFormat format;
format.column_names(col_names);
```
### Parsing an In-Memory String
```cpp
# include "csv.hpp"
using namespace csv;
...
// Method 1: Using parse()
std::string csv_string = "Actor,Character\r\n"
"Will Ferrell,Ricky Bobby\r\n"
"John C. Reilly,Cal Naughton Jr.\r\n"
"Sacha Baron Cohen,Jean Giard\r\n";
auto rows = parse(csv_string);
for (auto& r: rows) {
// Do stuff with row here
}
// Method 2: Using _csv operator
auto rows = "Actor,Character\r\n"
"Will Ferrell,Ricky Bobby\r\n"
"John C. Reilly,Cal Naughton Jr.\r\n"
"Sacha Baron Cohen,Jean Giard\r\n"_csv;
for (auto& r: rows) {
// Do stuff with row here
}
```
### Writing CSV Files
```cpp
# include "csv.hpp"
# include ...
using namespace csv;
using namespace std;
...
stringstream ss; // Can also use ofstream, etc.
auto writer = make_csv_writer(ss);
// auto writer = make_tsv_writer(ss); // For tab-separated files
// DelimWriter<stringstream, '|', '"'> writer(ss); // Your own custom format
// set_decimal_places(2); // How many places after the decimal will be written for floats
writer << vector<string>({ "A", "B", "C" })
<< deque<string>({ "I'm", "too", "tired" })
<< list<string>({ "to", "write", "documentation." });
writer << array<string, 2>({ "The quick brown", "fox", "jumps over the lazy dog" });
writer << make_tuple(1, 2.0, "Three");
...
```
You can pass in arbitrary types into `DelimWriter` by defining a conversion function
for that type to `std::string`.

View File

@ -0,0 +1,4 @@
// Hint files help the Visual Studio IDE interpret Visual C++ identifiers
// such as names of functions and macros.
// For more information see https://go.microsoft.com/fwlink/?linkid=865984
#define CONSTEXPR

View File

@ -0,0 +1,39 @@
/*
CSV for C++, version 2.3.0
https://github.com/vincentlaucsb/csv-parser
MIT License
Copyright (c) 2017-2024 Vincent La
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/
#pragma once
#ifndef CSV_HPP
#define CSV_HPP
#include "internal/csv_reader.hpp"
#include "internal/csv_stat.hpp"
#include "internal/csv_utility.hpp"
#include "internal/csv_writer.hpp"
/** INSERT_CSV_SOURCES **/
#endif

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,28 @@
add_library(csv STATIC "")
target_sources(csv
PRIVATE
basic_csv_parser.hpp
basic_csv_parser.cpp
col_names.cpp
col_names.hpp
common.hpp
csv_format.hpp
csv_format.cpp
csv_reader.hpp
csv_reader.cpp
csv_reader_iterator.cpp
csv_row.hpp
csv_row.cpp
csv_row_json.cpp
csv_stat.cpp
csv_stat.hpp
csv_utility.cpp
csv_utility.hpp
csv_writer.hpp
"data_type.hpp"
)
set_target_properties(csv PROPERTIES LINKER_LANGUAGE CXX)
target_link_libraries(csv PRIVATE Threads::Threads)
target_include_directories(csv INTERFACE ../)

View File

@ -0,0 +1,266 @@
#include "basic_csv_parser.hpp"
namespace csv {
namespace internals {
CSV_INLINE size_t get_file_size(csv::string_view filename) {
std::ifstream infile(std::string(filename), std::ios::binary);
const auto start = infile.tellg();
infile.seekg(0, std::ios::end);
const auto end = infile.tellg();
return end - start;
}
CSV_INLINE std::string get_csv_head(csv::string_view filename) {
return get_csv_head(filename, get_file_size(filename));
}
CSV_INLINE std::string get_csv_head(csv::string_view filename, size_t file_size) {
const size_t bytes = 500000;
std::error_code error;
size_t length = std::min((size_t)file_size, bytes);
auto mmap = mio::make_mmap_source(std::string(filename), 0, length, error);
if (error) {
throw std::runtime_error("Cannot open file " + std::string(filename));
}
return std::string(mmap.begin(), mmap.end());
}
#ifdef _MSC_VER
#pragma region IBasicCVParser
#endif
CSV_INLINE IBasicCSVParser::IBasicCSVParser(
const CSVFormat& format,
const ColNamesPtr& col_names
) : _col_names(col_names) {
if (format.no_quote) {
_parse_flags = internals::make_parse_flags(format.get_delim());
}
else {
_parse_flags = internals::make_parse_flags(format.get_delim(), format.quote_char);
}
_ws_flags = internals::make_ws_flags(
format.trim_chars.data(), format.trim_chars.size()
);
}
CSV_INLINE void IBasicCSVParser::end_feed() {
using internals::ParseFlags;
bool empty_last_field = this->data_ptr
&& this->data_ptr->_data
&& !this->data_ptr->data.empty()
&& (parse_flag(this->data_ptr->data.back()) == ParseFlags::DELIMITER
|| parse_flag(this->data_ptr->data.back()) == ParseFlags::QUOTE);
// Push field
if (this->field_length > 0 || empty_last_field) {
this->push_field();
}
// Push row
if (this->current_row.size() > 0)
this->push_row();
}
CSV_INLINE void IBasicCSVParser::parse_field() noexcept {
using internals::ParseFlags;
auto& in = this->data_ptr->data;
// Trim off leading whitespace
while (data_pos < in.size() && ws_flag(in[data_pos]))
data_pos++;
if (field_start == UNINITIALIZED_FIELD)
field_start = (int)(data_pos - current_row_start());
// Optimization: Since NOT_SPECIAL characters tend to occur in contiguous
// sequences, use the loop below to avoid having to go through the outer
// switch statement as much as possible
while (data_pos < in.size() && compound_parse_flag(in[data_pos]) == ParseFlags::NOT_SPECIAL)
data_pos++;
field_length = data_pos - (field_start + current_row_start());
// Trim off trailing whitespace, this->field_length constraint matters
// when field is entirely whitespace
for (size_t j = data_pos - 1; ws_flag(in[j]) && this->field_length > 0; j--)
this->field_length--;
}
CSV_INLINE void IBasicCSVParser::push_field()
{
// Update
if (field_has_double_quote) {
fields->emplace_back(
field_start == UNINITIALIZED_FIELD ? 0 : (unsigned int)field_start,
field_length,
true
);
field_has_double_quote = false;
}
else {
fields->emplace_back(
field_start == UNINITIALIZED_FIELD ? 0 : (unsigned int)field_start,
field_length
);
}
current_row.row_length++;
// Reset field state
field_start = UNINITIALIZED_FIELD;
field_length = 0;
}
/** @return The number of characters parsed that belong to complete rows */
CSV_INLINE size_t IBasicCSVParser::parse()
{
using internals::ParseFlags;
this->quote_escape = false;
this->data_pos = 0;
this->current_row_start() = 0;
this->trim_utf8_bom();
auto& in = this->data_ptr->data;
while (this->data_pos < in.size()) {
switch (compound_parse_flag(in[this->data_pos])) {
case ParseFlags::DELIMITER:
this->push_field();
this->data_pos++;
break;
case ParseFlags::NEWLINE:
this->data_pos++;
// Catches CRLF (or LFLF, CRCRLF, or any other non-sensical combination of newlines)
while (this->data_pos < in.size() && parse_flag(in[this->data_pos]) == ParseFlags::NEWLINE)
this->data_pos++;
// End of record -> Write record
this->push_field();
this->push_row();
// Reset
this->current_row = CSVRow(data_ptr, this->data_pos, fields->size());
break;
case ParseFlags::NOT_SPECIAL:
this->parse_field();
break;
case ParseFlags::QUOTE_ESCAPE_QUOTE:
if (data_pos + 1 == in.size()) return this->current_row_start();
else if (data_pos + 1 < in.size()) {
auto next_ch = parse_flag(in[data_pos + 1]);
if (next_ch >= ParseFlags::DELIMITER) {
quote_escape = false;
data_pos++;
break;
}
else if (next_ch == ParseFlags::QUOTE) {
// Case: Escaped quote
data_pos += 2;
this->field_length += 2;
this->field_has_double_quote = true;
break;
}
}
// Case: Unescaped single quote => not strictly valid but we'll keep it
this->field_length++;
data_pos++;
break;
default: // Quote (currently not quote escaped)
if (this->field_length == 0) {
quote_escape = true;
data_pos++;
if (field_start == UNINITIALIZED_FIELD && data_pos < in.size() && !ws_flag(in[data_pos]))
field_start = (int)(data_pos - current_row_start());
break;
}
// Case: Unescaped quote
this->field_length++;
data_pos++;
break;
}
}
return this->current_row_start();
}
CSV_INLINE void IBasicCSVParser::push_row() {
current_row.row_length = fields->size() - current_row.fields_start;
this->_records->push_back(std::move(current_row));
}
CSV_INLINE void IBasicCSVParser::reset_data_ptr() {
this->data_ptr = std::make_shared<RawCSVData>();
this->data_ptr->parse_flags = this->_parse_flags;
this->data_ptr->col_names = this->_col_names;
this->fields = &(this->data_ptr->fields);
}
CSV_INLINE void IBasicCSVParser::trim_utf8_bom() {
auto& data = this->data_ptr->data;
if (!this->unicode_bom_scan && data.size() >= 3) {
if (data[0] == '\xEF' && data[1] == '\xBB' && data[2] == '\xBF') {
this->data_pos += 3; // Remove BOM from input string
this->_utf8_bom = true;
}
this->unicode_bom_scan = true;
}
}
#ifdef _MSC_VER
#pragma endregion
#endif
#ifdef _MSC_VER
#pragma region Specializations
#endif
CSV_INLINE void MmapParser::next(size_t bytes = ITERATION_CHUNK_SIZE) {
// Reset parser state
this->field_start = UNINITIALIZED_FIELD;
this->field_length = 0;
this->reset_data_ptr();
// Create memory map
size_t length = std::min(this->source_size - this->mmap_pos, bytes);
std::error_code error;
this->data_ptr->_data = std::make_shared<mio::basic_mmap_source<char>>(mio::make_mmap_source(this->_filename, this->mmap_pos, length, error));
this->mmap_pos += length;
if (error) throw error;
auto mmap_ptr = (mio::basic_mmap_source<char>*)(this->data_ptr->_data.get());
// Create string view
this->data_ptr->data = csv::string_view(mmap_ptr->data(), mmap_ptr->length());
// Parse
this->current_row = CSVRow(this->data_ptr);
size_t remainder = this->parse();
if (this->mmap_pos == this->source_size || no_chunk()) {
this->_eof = true;
this->end_feed();
}
this->mmap_pos -= (length - remainder);
}
#ifdef _MSC_VER
#pragma endregion
#endif
}
}

View File

@ -0,0 +1,392 @@
/** @file
* @brief Contains the main CSV parsing algorithm and various utility functions
*/
#pragma once
#include <algorithm>
#include <array>
#include <condition_variable>
#include <deque>
#include <fstream>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <thread>
#include <vector>
#include "../external/mio.hpp"
#include "col_names.hpp"
#include "common.hpp"
#include "csv_format.hpp"
#include "csv_row.hpp"
namespace csv {
namespace internals {
/** Create a vector v where each index i corresponds to the
* ASCII number for a character and, v[i + 128] labels it according to
* the CSVReader::ParseFlags enum
*/
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags(char delimiter) {
std::array<ParseFlags, 256> ret = {};
for (int i = -128; i < 128; i++) {
const int arr_idx = i + 128;
char ch = char(i);
if (ch == delimiter)
ret[arr_idx] = ParseFlags::DELIMITER;
else if (ch == '\r' || ch == '\n')
ret[arr_idx] = ParseFlags::NEWLINE;
else
ret[arr_idx] = ParseFlags::NOT_SPECIAL;
}
return ret;
}
/** Create a vector v where each index i corresponds to the
* ASCII number for a character and, v[i + 128] labels it according to
* the CSVReader::ParseFlags enum
*/
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags(char delimiter, char quote_char) {
std::array<ParseFlags, 256> ret = make_parse_flags(delimiter);
ret[(size_t)quote_char + 128] = ParseFlags::QUOTE;
return ret;
}
/** Create a vector v where each index i corresponds to the
* ASCII number for a character c and, v[i + 128] is true if
* c is a whitespace character
*/
HEDLEY_CONST CONSTEXPR_17 WhitespaceMap make_ws_flags(const char* ws_chars, size_t n_chars) {
std::array<bool, 256> ret = {};
for (int i = -128; i < 128; i++) {
const int arr_idx = i + 128;
char ch = char(i);
ret[arr_idx] = false;
for (size_t j = 0; j < n_chars; j++) {
if (ws_chars[j] == ch) {
ret[arr_idx] = true;
}
}
}
return ret;
}
inline WhitespaceMap make_ws_flags(const std::vector<char>& flags) {
return make_ws_flags(flags.data(), flags.size());
}
CSV_INLINE size_t get_file_size(csv::string_view filename);
CSV_INLINE std::string get_csv_head(csv::string_view filename);
/** Read the first 500KB of a CSV file */
CSV_INLINE std::string get_csv_head(csv::string_view filename, size_t file_size);
/** A std::deque wrapper which allows multiple read and write threads to concurrently
* access it along with providing read threads the ability to wait for the deque
* to become populated
*/
template<typename T>
class ThreadSafeDeque {
public:
ThreadSafeDeque(size_t notify_size = 100) : _notify_size(notify_size) {};
ThreadSafeDeque(const ThreadSafeDeque& other) {
this->data = other.data;
this->_notify_size = other._notify_size;
}
ThreadSafeDeque(const std::deque<T>& source) : ThreadSafeDeque() {
this->data = source;
}
void clear() noexcept { this->data.clear(); }
bool empty() const noexcept {
return this->data.empty();
}
T& front() noexcept {
return this->data.front();
}
T& operator[](size_t n) {
return this->data[n];
}
void push_back(T&& item) {
std::lock_guard<std::mutex> lock{ this->_lock };
this->data.push_back(std::move(item));
if (this->size() >= _notify_size) {
this->_cond.notify_all();
}
}
T pop_front() noexcept {
std::lock_guard<std::mutex> lock{ this->_lock };
T item = std::move(data.front());
data.pop_front();
return item;
}
size_t size() const noexcept { return this->data.size(); }
/** Returns true if a thread is actively pushing items to this deque */
constexpr bool is_waitable() const noexcept { return this->_is_waitable; }
/** Wait for an item to become available */
void wait() {
if (!is_waitable()) {
return;
}
std::unique_lock<std::mutex> lock{ this->_lock };
this->_cond.wait(lock, [this] { return this->size() >= _notify_size || !this->is_waitable(); });
lock.unlock();
}
typename std::deque<T>::iterator begin() noexcept {
return this->data.begin();
}
typename std::deque<T>::iterator end() noexcept {
return this->data.end();
}
/** Tell listeners that this deque is actively being pushed to */
void notify_all() {
std::unique_lock<std::mutex> lock{ this->_lock };
this->_is_waitable = true;
this->_cond.notify_all();
}
/** Tell all listeners to stop */
void kill_all() {
std::unique_lock<std::mutex> lock{ this->_lock };
this->_is_waitable = false;
this->_cond.notify_all();
}
private:
bool _is_waitable = false;
size_t _notify_size;
std::mutex _lock;
std::condition_variable _cond;
std::deque<T> data;
};
constexpr const int UNINITIALIZED_FIELD = -1;
}
/** Standard type for storing collection of rows */
using RowCollection = internals::ThreadSafeDeque<CSVRow>;
namespace internals {
/** Abstract base class which provides CSV parsing logic.
*
* Concrete implementations may customize this logic across
* different input sources, such as memory mapped files, stringstreams,
* etc...
*/
class IBasicCSVParser {
public:
IBasicCSVParser() = default;
IBasicCSVParser(const CSVFormat&, const ColNamesPtr&);
IBasicCSVParser(const ParseFlagMap& parse_flags, const WhitespaceMap& ws_flags
) : _parse_flags(parse_flags), _ws_flags(ws_flags) {};
virtual ~IBasicCSVParser() {}
/** Whether or not we have reached the end of source */
bool eof() { return this->_eof; }
/** Parse the next block of data */
virtual void next(size_t bytes) = 0;
/** Indicate the last block of data has been parsed */
void end_feed();
CONSTEXPR_17 ParseFlags parse_flag(const char ch) const noexcept {
return _parse_flags.data()[ch + 128];
}
CONSTEXPR_17 ParseFlags compound_parse_flag(const char ch) const noexcept {
return quote_escape_flag(parse_flag(ch), this->quote_escape);
}
/** Whether or not this CSV has a UTF-8 byte order mark */
CONSTEXPR bool utf8_bom() const { return this->_utf8_bom; }
void set_output(RowCollection& rows) { this->_records = &rows; }
protected:
/** @name Current Parser State */
///@{
CSVRow current_row;
RawCSVDataPtr data_ptr = nullptr;
ColNamesPtr _col_names = nullptr;
CSVFieldList* fields = nullptr;
int field_start = UNINITIALIZED_FIELD;
size_t field_length = 0;
/** An array where the (i + 128)th slot gives the ParseFlags for ASCII character i */
ParseFlagMap _parse_flags;
///@}
/** @name Current Stream/File State */
///@{
bool _eof = false;
/** The size of the incoming CSV */
size_t source_size = 0;
///@}
/** Whether or not source needs to be read in chunks */
CONSTEXPR bool no_chunk() const { return this->source_size < ITERATION_CHUNK_SIZE; }
/** Parse the current chunk of data *
*
* @returns How many character were read that are part of complete rows
*/
size_t parse();
/** Create a new RawCSVDataPtr for a new chunk of data */
void reset_data_ptr();
private:
/** An array where the (i + 128)th slot determines whether ASCII character i should
* be trimmed
*/
WhitespaceMap _ws_flags;
bool quote_escape = false;
bool field_has_double_quote = false;
/** Where we are in the current data block */
size_t data_pos = 0;
/** Whether or not an attempt to find Unicode BOM has been made */
bool unicode_bom_scan = false;
bool _utf8_bom = false;
/** Where complete rows should be pushed to */
RowCollection* _records = nullptr;
CONSTEXPR_17 bool ws_flag(const char ch) const noexcept {
return _ws_flags.data()[ch + 128];
}
size_t& current_row_start() {
return this->current_row.data_start;
}
void parse_field() noexcept;
/** Finish parsing the current field */
void push_field();
/** Finish parsing the current row */
void push_row();
/** Handle possible Unicode byte order mark */
void trim_utf8_bom();
};
/** A class for parsing CSV data from a `std::stringstream`
* or an `std::ifstream`
*/
template<typename TStream>
class StreamParser: public IBasicCSVParser {
using RowCollection = ThreadSafeDeque<CSVRow>;
public:
StreamParser(TStream& source,
const CSVFormat& format,
const ColNamesPtr& col_names = nullptr
) : IBasicCSVParser(format, col_names), _source(std::move(source)) {};
StreamParser(
TStream& source,
internals::ParseFlagMap parse_flags,
internals::WhitespaceMap ws_flags) :
IBasicCSVParser(parse_flags, ws_flags),
_source(std::move(source))
{};
~StreamParser() {}
void next(size_t bytes = ITERATION_CHUNK_SIZE) override {
if (this->eof()) return;
this->reset_data_ptr();
this->data_ptr->_data = std::make_shared<std::string>();
if (source_size == 0) {
const auto start = _source.tellg();
_source.seekg(0, std::ios::end);
const auto end = _source.tellg();
_source.seekg(0, std::ios::beg);
source_size = end - start;
}
// Read data into buffer
size_t length = std::min(source_size - stream_pos, bytes);
std::unique_ptr<char[]> buff(new char[length]);
_source.seekg(stream_pos, std::ios::beg);
_source.read(buff.get(), length);
stream_pos = _source.tellg();
((std::string*)(this->data_ptr->_data.get()))->assign(buff.get(), length);
// Create string_view
this->data_ptr->data = *((std::string*)this->data_ptr->_data.get());
// Parse
this->current_row = CSVRow(this->data_ptr);
size_t remainder = this->parse();
if (stream_pos == source_size || no_chunk()) {
this->_eof = true;
this->end_feed();
}
else {
this->stream_pos -= (length - remainder);
}
}
private:
TStream _source;
size_t stream_pos = 0;
};
/** Parser for memory-mapped files
*
* @par Implementation
* This class constructs moving windows over a file to avoid
* creating massive memory maps which may require more RAM
* than the user has available. It contains logic to automatically
* re-align each memory map to the beginning of a CSV row.
*
*/
class MmapParser : public IBasicCSVParser {
public:
MmapParser(csv::string_view filename,
const CSVFormat& format,
const ColNamesPtr& col_names = nullptr
) : IBasicCSVParser(format, col_names) {
this->_filename = filename.data();
this->source_size = get_file_size(filename);
};
~MmapParser() {}
void next(size_t bytes) override;
private:
std::string _filename;
size_t mmap_pos = 0;
};
}
}

View File

@ -0,0 +1,30 @@
#include "col_names.hpp"
namespace csv {
namespace internals {
CSV_INLINE std::vector<std::string> ColNames::get_col_names() const {
return this->col_names;
}
CSV_INLINE void ColNames::set_col_names(const std::vector<std::string>& cnames) {
this->col_names = cnames;
for (size_t i = 0; i < cnames.size(); i++) {
this->col_pos[cnames[i]] = i;
}
}
CSV_INLINE int ColNames::index_of(csv::string_view col_name) const {
auto pos = this->col_pos.find(col_name.data());
if (pos != this->col_pos.end())
return (int)pos->second;
return CSV_NOT_FOUND;
}
CSV_INLINE size_t ColNames::size() const noexcept {
return this->col_names.size();
}
}
}

View File

@ -0,0 +1,40 @@
#pragma once
#include <memory>
#include <unordered_map>
#include <string>
#include <vector>
#include "common.hpp"
namespace csv {
namespace internals {
struct ColNames;
using ColNamesPtr = std::shared_ptr<ColNames>;
/** @struct ColNames
* A data structure for handling column name information.
*
* These are created by CSVReader and passed (via smart pointer)
* to CSVRow objects it creates, thus
* allowing for indexing by column name.
*/
struct ColNames {
public:
ColNames() = default;
ColNames(const std::vector<std::string>& names) {
set_col_names(names);
}
std::vector<std::string> get_col_names() const;
void set_col_names(const std::vector<std::string>&);
int index_of(csv::string_view) const;
bool empty() const noexcept { return this->col_names.empty(); }
size_t size() const noexcept;
private:
std::vector<std::string> col_names;
std::unordered_map<std::string, size_t> col_pos;
};
}
}

View File

@ -0,0 +1,208 @@
/** @file
* A standalone header file containing shared code
*/
#pragma once
#include <algorithm>
#include <array>
#include <cmath>
#include <cstdlib>
#include <deque>
#if defined(_WIN32)
# ifndef WIN32_LEAN_AND_MEAN
# define WIN32_LEAN_AND_MEAN
# endif
# include <windows.h>
# undef max
# undef min
#elif defined(__linux__)
# include <unistd.h>
#endif
/** Helper macro which should be #defined as "inline"
* in the single header version
*/
#define CSV_INLINE
#pragma once
#include <type_traits>
#include "../external/string_view.hpp"
// If there is another version of Hedley, then the newer one
// takes precedence.
// See: https://github.com/nemequ/hedley
#include "../external/hedley.h"
namespace csv {
#ifdef _MSC_VER
#pragma region Compatibility Macros
#endif
/**
* @def IF_CONSTEXPR
* Expands to `if constexpr` in C++17 and `if` otherwise
*
* @def CONSTEXPR_VALUE
* Expands to `constexpr` in C++17 and `const` otherwise.
* Mainly used for global variables.
*
* @def CONSTEXPR
* Expands to `constexpr` in decent compilers and `inline` otherwise.
* Intended for functions and methods.
*/
#define STATIC_ASSERT(x) static_assert(x, "Assertion failed")
#if CMAKE_CXX_STANDARD == 17 || __cplusplus >= 201703L
#define CSV_HAS_CXX17
#endif
#if CMAKE_CXX_STANDARD >= 14 || __cplusplus >= 201402L
#define CSV_HAS_CXX14
#endif
#ifdef CSV_HAS_CXX17
#include <string_view>
/** @typedef string_view
* The string_view class used by this library.
*/
using string_view = std::string_view;
#else
/** @typedef string_view
* The string_view class used by this library.
*/
using string_view = nonstd::string_view;
#endif
#ifdef CSV_HAS_CXX17
#define IF_CONSTEXPR if constexpr
#define CONSTEXPR_VALUE constexpr
#define CONSTEXPR_17 constexpr
#else
#define IF_CONSTEXPR if
#define CONSTEXPR_VALUE const
#define CONSTEXPR_17 inline
#endif
#ifdef CSV_HAS_CXX14
template<bool B, class T = void>
using enable_if_t = std::enable_if_t<B, T>;
#define CONSTEXPR_14 constexpr
#define CONSTEXPR_VALUE_14 constexpr
#else
template<bool B, class T = void>
using enable_if_t = typename std::enable_if<B, T>::type;
#define CONSTEXPR_14 inline
#define CONSTEXPR_VALUE_14 const
#endif
// Resolves g++ bug with regard to constexpr methods
// See: https://stackoverflow.com/questions/36489369/constexpr-non-static-member-function-with-non-constexpr-constructor-gcc-clang-d
#if defined __GNUC__ && !defined __clang__
#if (__GNUC__ >= 7 &&__GNUC_MINOR__ >= 2) || (__GNUC__ >= 8)
#define CONSTEXPR constexpr
#endif
#else
#ifdef CSV_HAS_CXX17
#define CONSTEXPR constexpr
#endif
#endif
#ifndef CONSTEXPR
#define CONSTEXPR inline
#endif
#ifdef _MSC_VER
#pragma endregion
#endif
namespace internals {
// PAGE_SIZE macro could be already defined by the host system.
#if defined(PAGE_SIZE)
#undef PAGE_SIZE
#endif
// Get operating system specific details
#if defined(_WIN32)
inline int getpagesize() {
_SYSTEM_INFO sys_info = {};
GetSystemInfo(&sys_info);
return std::max(sys_info.dwPageSize, sys_info.dwAllocationGranularity);
}
const int PAGE_SIZE = getpagesize();
#elif defined(__linux__)
const int PAGE_SIZE = getpagesize();
#else
/** Size of a memory page in bytes. Used by
* csv::internals::CSVFieldArray when allocating blocks.
*/
const int PAGE_SIZE = 4096;
#endif
/** For functions that lazy load a large CSV, this determines how
* many bytes are read at a time
*/
constexpr size_t ITERATION_CHUNK_SIZE = 10000000; // 10MB
template<typename T>
inline bool is_equal(T a, T b, T epsilon = 0.001) {
/** Returns true if two floating point values are about the same */
static_assert(std::is_floating_point<T>::value, "T must be a floating point type.");
return std::abs(a - b) < epsilon;
}
/** @typedef ParseFlags
* An enum used for describing the significance of each character
* with respect to CSV parsing
*
* @see quote_escape_flag
*/
enum class ParseFlags {
QUOTE_ESCAPE_QUOTE = 0, /**< A quote inside or terminating a quote_escaped field */
QUOTE = 2 | 1, /**< Characters which may signify a quote escape */
NOT_SPECIAL = 4, /**< Characters with no special meaning or escaped delimiters and newlines */
DELIMITER = 4 | 2, /**< Characters which signify a new field */
NEWLINE = 4 | 2 | 1 /**< Characters which signify a new row */
};
/** Transform the ParseFlags given the context of whether or not the current
* field is quote escaped */
constexpr ParseFlags quote_escape_flag(ParseFlags flag, bool quote_escape) noexcept {
return (ParseFlags)((int)flag & ~((int)ParseFlags::QUOTE * quote_escape));
}
// Assumed to be true by parsing functions: allows for testing
// if an item is DELIMITER or NEWLINE with a >= statement
STATIC_ASSERT(ParseFlags::DELIMITER < ParseFlags::NEWLINE);
/** Optimizations for reducing branching in parsing loop
*
* Idea: The meaning of all non-quote characters changes depending
* on whether or not the parser is in a quote-escaped mode (0 or 1)
*/
STATIC_ASSERT(quote_escape_flag(ParseFlags::NOT_SPECIAL, false) == ParseFlags::NOT_SPECIAL);
STATIC_ASSERT(quote_escape_flag(ParseFlags::QUOTE, false) == ParseFlags::QUOTE);
STATIC_ASSERT(quote_escape_flag(ParseFlags::DELIMITER, false) == ParseFlags::DELIMITER);
STATIC_ASSERT(quote_escape_flag(ParseFlags::NEWLINE, false) == ParseFlags::NEWLINE);
STATIC_ASSERT(quote_escape_flag(ParseFlags::NOT_SPECIAL, true) == ParseFlags::NOT_SPECIAL);
STATIC_ASSERT(quote_escape_flag(ParseFlags::QUOTE, true) == ParseFlags::QUOTE_ESCAPE_QUOTE);
STATIC_ASSERT(quote_escape_flag(ParseFlags::DELIMITER, true) == ParseFlags::NOT_SPECIAL);
STATIC_ASSERT(quote_escape_flag(ParseFlags::NEWLINE, true) == ParseFlags::NOT_SPECIAL);
/** An array which maps ASCII chars to a parsing flag */
using ParseFlagMap = std::array<ParseFlags, 256>;
/** An array which maps ASCII chars to a flag indicating if it is whitespace */
using WhitespaceMap = std::array<bool, 256>;
}
/** Integer indicating a requested column wasn't found. */
constexpr int CSV_NOT_FOUND = -1;
}

View File

@ -0,0 +1,92 @@
/** @file
* Defines an object used to store CSV format settings
*/
#include <algorithm>
#include <set>
#include "csv_format.hpp"
namespace csv {
CSV_INLINE CSVFormat& CSVFormat::delimiter(char delim) {
this->possible_delimiters = { delim };
this->assert_no_char_overlap();
return *this;
}
CSV_INLINE CSVFormat& CSVFormat::delimiter(const std::vector<char> & delim) {
this->possible_delimiters = delim;
this->assert_no_char_overlap();
return *this;
}
CSV_INLINE CSVFormat& CSVFormat::quote(char quote) {
this->no_quote = false;
this->quote_char = quote;
this->assert_no_char_overlap();
return *this;
}
CSV_INLINE CSVFormat& CSVFormat::trim(const std::vector<char> & chars) {
this->trim_chars = chars;
this->assert_no_char_overlap();
return *this;
}
CSV_INLINE CSVFormat& CSVFormat::column_names(const std::vector<std::string>& names) {
this->col_names = names;
this->header = -1;
return *this;
}
CSV_INLINE CSVFormat& CSVFormat::header_row(int row) {
if (row < 0) this->variable_column_policy = VariableColumnPolicy::KEEP;
this->header = row;
this->col_names = {};
return *this;
}
CSV_INLINE void CSVFormat::assert_no_char_overlap()
{
auto delims = std::set<char>(
this->possible_delimiters.begin(), this->possible_delimiters.end()),
trims = std::set<char>(
this->trim_chars.begin(), this->trim_chars.end());
// Stores intersection of possible delimiters and trim characters
std::vector<char> intersection = {};
// Find which characters overlap, if any
std::set_intersection(
delims.begin(), delims.end(),
trims.begin(), trims.end(),
std::back_inserter(intersection));
// Make sure quote character is not contained in possible delimiters
// or whitespace characters
if (delims.find(this->quote_char) != delims.end() ||
trims.find(this->quote_char) != trims.end()) {
intersection.push_back(this->quote_char);
}
if (!intersection.empty()) {
std::string err_msg = "There should be no overlap between the quote character, "
"the set of possible delimiters "
"and the set of whitespace characters. Offending characters: ";
// Create a pretty error message with the list of overlapping
// characters
for (size_t i = 0; i < intersection.size(); i++) {
err_msg += "'";
err_msg += intersection[i];
err_msg += "'";
if (i + 1 < intersection.size())
err_msg += ", ";
}
throw std::runtime_error(err_msg + '.');
}
}
}

View File

@ -0,0 +1,167 @@
/** @file
* Defines an object used to store CSV format settings
*/
#pragma once
#include <iterator>
#include <stdexcept>
#include <string>
#include <vector>
#include "common.hpp"
namespace csv {
namespace internals {
class IBasicCSVParser;
}
class CSVReader;
/** Determines how to handle rows that are shorter or longer than the majority */
enum class VariableColumnPolicy {
THROW = -1,
IGNORE_ROW = 0,
KEEP = 1
};
/** Stores the inferred format of a CSV file. */
struct CSVGuessResult {
char delim;
int header_row;
};
/** Stores information about how to parse a CSV file.
* Can be used to construct a csv::CSVReader.
*/
class CSVFormat {
public:
/** Settings for parsing a RFC 4180 CSV file */
CSVFormat() = default;
/** Sets the delimiter of the CSV file
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
*/
CSVFormat& delimiter(char delim);
/** Sets a list of potential delimiters
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
* @param[in] delim An array of possible delimiters to try parsing the CSV with
*/
CSVFormat& delimiter(const std::vector<char> & delim);
/** Sets the whitespace characters to be trimmed
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
* @param[in] ws An array of whitespace characters that should be trimmed
*/
CSVFormat& trim(const std::vector<char> & ws);
/** Sets the quote character
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
*/
CSVFormat& quote(char quote);
/** Sets the column names.
*
* @note Unsets any values set by header_row()
*/
CSVFormat& column_names(const std::vector<std::string>& names);
/** Sets the header row
*
* @note Unsets any values set by column_names()
*/
CSVFormat& header_row(int row);
/** Tells the parser that this CSV has no header row
*
* @note Equivalent to `header_row(-1)`
*
*/
CSVFormat& no_header() {
this->header_row(-1);
return *this;
}
/** Turn quoting on or off */
CSVFormat& quote(bool use_quote) {
this->no_quote = !use_quote;
return *this;
}
/** Tells the parser how to handle columns of a different length than the others */
CONSTEXPR_14 CSVFormat& variable_columns(VariableColumnPolicy policy = VariableColumnPolicy::IGNORE_ROW) {
this->variable_column_policy = policy;
return *this;
}
/** Tells the parser how to handle columns of a different length than the others */
CONSTEXPR_14 CSVFormat& variable_columns(bool policy) {
this->variable_column_policy = (VariableColumnPolicy)policy;
return *this;
}
#ifndef DOXYGEN_SHOULD_SKIP_THIS
char get_delim() const {
// This error should never be received by end users.
if (this->possible_delimiters.size() > 1) {
throw std::runtime_error("There is more than one possible delimiter.");
}
return this->possible_delimiters.at(0);
}
CONSTEXPR bool is_quoting_enabled() const { return !this->no_quote; }
CONSTEXPR char get_quote_char() const { return this->quote_char; }
CONSTEXPR int get_header() const { return this->header; }
std::vector<char> get_possible_delims() const { return this->possible_delimiters; }
std::vector<char> get_trim_chars() const { return this->trim_chars; }
CONSTEXPR VariableColumnPolicy get_variable_column_policy() const { return this->variable_column_policy; }
#endif
/** CSVFormat for guessing the delimiter */
CSV_INLINE static CSVFormat guess_csv() {
CSVFormat format;
format.delimiter({ ',', '|', '\t', ';', '^' })
.quote('"')
.header_row(0);
return format;
}
bool guess_delim() {
return this->possible_delimiters.size() > 1;
}
friend CSVReader;
friend internals::IBasicCSVParser;
private:
/**< Throws an error if delimiters and trim characters overlap */
void assert_no_char_overlap();
/**< Set of possible delimiters */
std::vector<char> possible_delimiters = { ',' };
/**< Set of whitespace characters to trim */
std::vector<char> trim_chars = {};
/**< Row number with columns (ignored if col_names is non-empty) */
int header = 0;
/**< Whether or not to use quoting */
bool no_quote = false;
/**< Quote character */
char quote_char = '"';
/**< Should be left empty unless file doesn't include header */
std::vector<std::string> col_names = {};
/**< Allow variable length columns? */
VariableColumnPolicy variable_column_policy = VariableColumnPolicy::IGNORE_ROW;
};
}

View File

@ -0,0 +1,309 @@
/** @file
* @brief Defines functionality needed for basic CSV parsing
*/
#include "csv_reader.hpp"
namespace csv {
namespace internals {
CSV_INLINE std::string format_row(const std::vector<std::string>& row, csv::string_view delim) {
/** Print a CSV row */
std::stringstream ret;
for (size_t i = 0; i < row.size(); i++) {
ret << row[i];
if (i + 1 < row.size()) ret << delim;
else ret << '\n';
}
ret.flush();
return ret.str();
}
/** Return a CSV's column names
*
* @param[in] filename Path to CSV file
* @param[in] format Format of the CSV file
*
*/
CSV_INLINE std::vector<std::string> _get_col_names(csv::string_view head, CSVFormat format) {
// Parse the CSV
auto trim_chars = format.get_trim_chars();
std::stringstream source(head.data());
RowCollection rows;
StreamParser<std::stringstream> parser(source, format);
parser.set_output(rows);
parser.next();
return CSVRow(std::move(rows[format.get_header()]));
}
CSV_INLINE GuessScore calculate_score(csv::string_view head, const CSVFormat& format) {
// Frequency counter of row length
std::unordered_map<size_t, size_t> row_tally = { { 0, 0 } };
// Map row lengths to row num where they first occurred
std::unordered_map<size_t, size_t> row_when = { { 0, 0 } };
// Parse the CSV
std::stringstream source(head.data());
RowCollection rows;
StreamParser<std::stringstream> parser(source, format);
parser.set_output(rows);
parser.next();
for (size_t i = 0; i < rows.size(); i++) {
auto& row = rows[i];
// Ignore zero-length rows
if (row.size() > 0) {
if (row_tally.find(row.size()) != row_tally.end()) {
row_tally[row.size()]++;
}
else {
row_tally[row.size()] = 1;
row_when[row.size()] = i;
}
}
}
double final_score = 0;
size_t header_row = 0;
// Final score is equal to the largest
// row size times rows of that size
for (auto& pair : row_tally) {
auto row_size = pair.first;
auto row_count = pair.second;
double score = (double)(row_size * row_count);
if (score > final_score) {
final_score = score;
header_row = row_when[row_size];
}
}
return {
final_score,
header_row
};
}
/** Guess the delimiter used by a delimiter-separated values file */
CSV_INLINE CSVGuessResult _guess_format(csv::string_view head, const std::vector<char>& delims) {
/** For each delimiter, find out which row length was most common.
* The delimiter with the longest mode row length wins.
* Then, the line number of the header row is the first row with
* the mode row length.
*/
CSVFormat format;
size_t max_score = 0,
header = 0;
char current_delim = delims[0];
for (char cand_delim : delims) {
auto result = calculate_score(head, format.delimiter(cand_delim));
if ((size_t)result.score > max_score) {
max_score = (size_t)result.score;
current_delim = cand_delim;
header = result.header;
}
}
return { current_delim, (int)header };
}
}
/** Return a CSV's column names
*
* @param[in] filename Path to CSV file
* @param[in] format Format of the CSV file
*
*/
CSV_INLINE std::vector<std::string> get_col_names(csv::string_view filename, CSVFormat format) {
auto head = internals::get_csv_head(filename);
/** Guess delimiter and header row */
if (format.guess_delim()) {
auto guess_result = guess_format(filename, format.get_possible_delims());
format.delimiter(guess_result.delim).header_row(guess_result.header_row);
}
return internals::_get_col_names(head, format);
}
/** Guess the delimiter used by a delimiter-separated values file */
CSV_INLINE CSVGuessResult guess_format(csv::string_view filename, const std::vector<char>& delims) {
auto head = internals::get_csv_head(filename);
return internals::_guess_format(head, delims);
}
/** Reads an arbitrarily large CSV file using memory-mapped IO.
*
* **Details:** Reads the first block of a CSV file synchronously to get information
* such as column names and delimiting character.
*
* @param[in] filename Path to CSV file
* @param[in] format Format of the CSV file
*
* \snippet tests/test_read_csv.cpp CSVField Example
*
*/
CSV_INLINE CSVReader::CSVReader(csv::string_view filename, CSVFormat format) : _format(format) {
auto head = internals::get_csv_head(filename);
using Parser = internals::MmapParser;
/** Guess delimiter and header row */
if (format.guess_delim()) {
auto guess_result = internals::_guess_format(head, format.possible_delimiters);
format.delimiter(guess_result.delim);
format.header = guess_result.header_row;
this->_format = format;
}
if (!format.col_names.empty())
this->set_col_names(format.col_names);
this->parser = std::unique_ptr<Parser>(new Parser(filename, format, this->col_names)); // For C++11
this->initial_read();
}
/** Return the format of the original raw CSV */
CSV_INLINE CSVFormat CSVReader::get_format() const {
CSVFormat new_format = this->_format;
// Since users are normally not allowed to set
// column names and header row simulatenously,
// we will set the backing variables directly here
new_format.col_names = this->col_names->get_col_names();
new_format.header = this->_format.header;
return new_format;
}
/** Return the CSV's column names as a vector of strings. */
CSV_INLINE std::vector<std::string> CSVReader::get_col_names() const {
if (this->col_names) {
return this->col_names->get_col_names();
}
return std::vector<std::string>();
}
/** Return the index of the column name if found or
* csv::CSV_NOT_FOUND otherwise.
*/
CSV_INLINE int CSVReader::index_of(csv::string_view col_name) const {
auto _col_names = this->get_col_names();
for (size_t i = 0; i < _col_names.size(); i++)
if (_col_names[i] == col_name) return (int)i;
return CSV_NOT_FOUND;
}
CSV_INLINE void CSVReader::trim_header() {
if (!this->header_trimmed) {
for (int i = 0; i <= this->_format.header && !this->records->empty(); i++) {
if (i == this->_format.header && this->col_names->empty()) {
this->set_col_names(this->records->pop_front());
}
else {
this->records->pop_front();
}
}
this->header_trimmed = true;
}
}
/**
* @param[in] names Column names
*/
CSV_INLINE void CSVReader::set_col_names(const std::vector<std::string>& names)
{
this->col_names->set_col_names(names);
this->n_cols = names.size();
}
/**
* Read a chunk of CSV data.
*
* @note This method is meant to be run on its own thread. Only one `read_csv()` thread
* should be active at a time.
*
* @param[in] bytes Number of bytes to read.
*
* @see CSVReader::read_csv_worker
* @see CSVReader::read_row()
*/
CSV_INLINE bool CSVReader::read_csv(size_t bytes) {
// Tell read_row() to listen for CSV rows
this->records->notify_all();
this->parser->set_output(*this->records);
this->parser->next(bytes);
if (!this->header_trimmed) {
this->trim_header();
}
// Tell read_row() to stop waiting
this->records->kill_all();
return true;
}
/**
* Retrieve rows as CSVRow objects, returning true if more rows are available.
*
* @par Performance Notes
* - Reads chunks of data that are csv::internals::ITERATION_CHUNK_SIZE bytes large at a time
* - For performance details, read the documentation for CSVRow and CSVField.
*
* @param[out] row The variable where the parsed row will be stored
* @see CSVRow, CSVField
*
* **Example:**
* \snippet tests/test_read_csv.cpp CSVField Example
*
*/
CSV_INLINE bool CSVReader::read_row(CSVRow &row) {
while (true) {
if (this->records->empty()) {
if (this->records->is_waitable())
// Reading thread is currently active => wait for it to populate records
this->records->wait();
else if (this->parser->eof())
// End of file and no more records
return false;
else {
// Reading thread is not active => start another one
if (this->read_csv_worker.joinable())
this->read_csv_worker.join();
this->read_csv_worker = std::thread(&CSVReader::read_csv, this, internals::ITERATION_CHUNK_SIZE);
}
}
else if (this->records->front().size() != this->n_cols &&
this->_format.variable_column_policy != VariableColumnPolicy::KEEP) {
auto errored_row = this->records->pop_front();
if (this->_format.variable_column_policy == VariableColumnPolicy::THROW) {
if (errored_row.size() < this->n_cols)
throw std::runtime_error("Line too short " + internals::format_row(errored_row));
throw std::runtime_error("Line too long " + internals::format_row(errored_row));
}
}
else {
row = this->records->pop_front();
this->_n_rows++;
return true;
}
}
return false;
}
}

View File

@ -0,0 +1,230 @@
/** @file
* @brief Defines functionality needed for basic CSV parsing
*/
#pragma once
#include <algorithm>
#include <deque>
#include <fstream>
#include <iterator>
#include <memory>
#include <mutex>
#include <thread>
#include <sstream>
#include <string>
#include <vector>
#include "../external/mio.hpp"
#include "basic_csv_parser.hpp"
#include "common.hpp"
#include "data_type.hpp"
#include "csv_format.hpp"
/** The all encompassing namespace */
namespace csv {
/** Stuff that is generally not of interest to end-users */
namespace internals {
std::string format_row(const std::vector<std::string>& row, csv::string_view delim = ", ");
std::vector<std::string> _get_col_names( csv::string_view head, const CSVFormat format = CSVFormat::guess_csv());
struct GuessScore {
double score;
size_t header;
};
CSV_INLINE GuessScore calculate_score(csv::string_view head, const CSVFormat& format);
CSVGuessResult _guess_format(csv::string_view head, const std::vector<char>& delims = { ',', '|', '\t', ';', '^', '~' });
}
std::vector<std::string> get_col_names(
csv::string_view filename,
const CSVFormat format = CSVFormat::guess_csv());
/** Guess the delimiter used by a delimiter-separated values file */
CSVGuessResult guess_format(csv::string_view filename,
const std::vector<char>& delims = { ',', '|', '\t', ';', '^', '~' });
/** @class CSVReader
* @brief Main class for parsing CSVs from files and in-memory sources
*
* All rows are compared to the column names for length consistency
* - By default, rows that are too short or too long are dropped
* - Custom behavior can be defined by overriding bad_row_handler in a subclass
*/
class CSVReader {
public:
/**
* An input iterator capable of handling large files.
* @note Created by CSVReader::begin() and CSVReader::end().
*
* @par Iterating over a file
* @snippet tests/test_csv_iterator.cpp CSVReader Iterator 1
*
* @par Using with `<algorithm>` library
* @snippet tests/test_csv_iterator.cpp CSVReader Iterator 2
*/
class iterator {
public:
#ifndef DOXYGEN_SHOULD_SKIP_THIS
using value_type = CSVRow;
using difference_type = std::ptrdiff_t;
using pointer = CSVRow * ;
using reference = CSVRow & ;
using iterator_category = std::input_iterator_tag;
#endif
iterator() = default;
iterator(CSVReader* reader) : daddy(reader) {};
iterator(CSVReader*, CSVRow&&);
/** Access the CSVRow held by the iterator */
CONSTEXPR_14 reference operator*() { return this->row; }
/** Return a pointer to the CSVRow the iterator has stopped at */
CONSTEXPR_14 pointer operator->() { return &(this->row); }
iterator& operator++(); /**< Pre-increment iterator */
iterator operator++(int); /**< Post-increment iterator */
/** Returns true if iterators were constructed from the same CSVReader
* and point to the same row
*/
CONSTEXPR bool operator==(const iterator& other) const noexcept {
return (this->daddy == other.daddy) && (this->i == other.i);
}
CONSTEXPR bool operator!=(const iterator& other) const noexcept { return !operator==(other); }
private:
CSVReader * daddy = nullptr; // Pointer to parent
CSVRow row; // Current row
size_t i = 0; // Index of current row
};
/** @name Constructors
* Constructors for iterating over large files and parsing in-memory sources.
*/
///@{
CSVReader(csv::string_view filename, CSVFormat format = CSVFormat::guess_csv());
/** Allows parsing stream sources such as `std::stringstream` or `std::ifstream`
*
* @tparam TStream An input stream deriving from `std::istream`
* @note Currently this constructor requires special CSV dialects to be manually
* specified.
*/
template<typename TStream,
csv::enable_if_t<std::is_base_of<std::istream, TStream>::value, int> = 0>
CSVReader(TStream& source, CSVFormat format = CSVFormat()) : _format(format) {
using Parser = internals::StreamParser<TStream>;
if (!format.col_names.empty())
this->set_col_names(format.col_names);
this->parser = std::unique_ptr<Parser>(
new Parser(source, format, col_names)); // For C++11
this->initial_read();
}
///@}
CSVReader(const CSVReader&) = delete; // No copy constructor
CSVReader(CSVReader&&) = default; // Move constructor
CSVReader& operator=(const CSVReader&) = delete; // No copy assignment
CSVReader& operator=(CSVReader&& other) = default;
~CSVReader() {
if (this->read_csv_worker.joinable()) {
this->read_csv_worker.join();
}
}
/** @name Retrieving CSV Rows */
///@{
bool read_row(CSVRow &row);
iterator begin();
HEDLEY_CONST iterator end() const noexcept;
/** Returns true if we have reached end of file */
bool eof() const noexcept { return this->parser->eof(); };
///@}
/** @name CSV Metadata */
///@{
CSVFormat get_format() const;
std::vector<std::string> get_col_names() const;
int index_of(csv::string_view col_name) const;
///@}
/** @name CSV Metadata: Attributes */
///@{
/** Whether or not the file or stream contains valid CSV rows,
* not including the header.
*
* @note Gives an accurate answer regardless of when it is called.
*
*/
CONSTEXPR bool empty() const noexcept { return this->n_rows() == 0; }
/** Retrieves the number of rows that have been read so far */
CONSTEXPR size_t n_rows() const noexcept { return this->_n_rows; }
/** Whether or not CSV was prefixed with a UTF-8 bom */
bool utf8_bom() const noexcept { return this->parser->utf8_bom(); }
///@}
protected:
/**
* \defgroup csv_internal CSV Parser Internals
* @brief Internals of CSVReader. Only maintainers and those looking to
* extend the parser should read this.
* @{
*/
/** Sets this reader's column names and associated data */
void set_col_names(const std::vector<std::string>&);
/** @name CSV Settings **/
///@{
CSVFormat _format;
///@}
/** @name Parser State */
///@{
/** Pointer to a object containing column information */
internals::ColNamesPtr col_names = std::make_shared<internals::ColNames>();
/** Helper class which actually does the parsing */
std::unique_ptr<internals::IBasicCSVParser> parser = nullptr;
/** Queue of parsed CSV rows */
std::unique_ptr<RowCollection> records{new RowCollection(100)};
size_t n_cols = 0; /**< The number of columns in this CSV */
size_t _n_rows = 0; /**< How many rows (minus header) have been read so far */
/** @name Multi-Threaded File Reading Functions */
///@{
bool read_csv(size_t bytes = internals::ITERATION_CHUNK_SIZE);
///@}
/**@}*/
private:
/** Whether or not rows before header were trimmed */
bool header_trimmed = false;
/** @name Multi-Threaded File Reading: Flags and State */
///@{
std::thread read_csv_worker; /**< Worker thread for read_csv() */
///@}
/** Read initial chunk to get metadata */
void initial_read() {
this->read_csv_worker = std::thread(&CSVReader::read_csv, this, internals::ITERATION_CHUNK_SIZE);
this->read_csv_worker.join();
}
void trim_header();
};
}

View File

@ -0,0 +1,63 @@
/** @file
* Defines an input iterator for csv::CSVReader
*/
#include "csv_reader.hpp"
namespace csv {
/** Return an iterator to the first row in the reader */
CSV_INLINE CSVReader::iterator CSVReader::begin() {
if (this->records->empty()) {
this->read_csv_worker = std::thread(&CSVReader::read_csv, this, internals::ITERATION_CHUNK_SIZE);
this->read_csv_worker.join();
// Still empty => return end iterator
if (this->records->empty()) return this->end();
}
this->_n_rows++;
CSVReader::iterator ret(this, this->records->pop_front());
return ret;
}
/** A placeholder for the imaginary past the end row in a CSV.
* Attempting to deference this will lead to bad things.
*/
CSV_INLINE HEDLEY_CONST CSVReader::iterator CSVReader::end() const noexcept {
return CSVReader::iterator();
}
/////////////////////////
// CSVReader::iterator //
/////////////////////////
CSV_INLINE CSVReader::iterator::iterator(CSVReader* _daddy, CSVRow&& _row) :
daddy(_daddy) {
row = std::move(_row);
}
/** Advance the iterator by one row. If this CSVReader has an
* associated file, then the iterator will lazily pull more data from
* that file until the end of file is reached.
*
* @note This iterator does **not** block the thread responsible for parsing CSV.
*
*/
CSV_INLINE CSVReader::iterator& CSVReader::iterator::operator++() {
if (!daddy->read_row(this->row)) {
this->daddy = nullptr; // this == end()
}
return *this;
}
/** Post-increment iterator */
CSV_INLINE CSVReader::iterator CSVReader::iterator::operator++(int) {
auto temp = *this;
if (!daddy->read_row(this->row)) {
this->daddy = nullptr; // this == end()
}
return temp;
}
}

View File

@ -0,0 +1,276 @@
/** @file
* Defines the data type used for storing information about a CSV row
*/
#include <cassert>
#include <functional>
#include "csv_row.hpp"
namespace csv {
namespace internals {
CSV_INLINE RawCSVField& CSVFieldList::operator[](size_t n) const {
const size_t page_no = n / _single_buffer_capacity;
const size_t buffer_idx = (page_no < 1) ? n : n % _single_buffer_capacity;
return this->buffers[page_no][buffer_idx];
}
CSV_INLINE void CSVFieldList::allocate() {
buffers.push_back(std::unique_ptr<RawCSVField[]>(new RawCSVField[_single_buffer_capacity]));
_current_buffer_size = 0;
_back = buffers.back().get();
}
}
/** Return a CSVField object corrsponding to the nth value in the row.
*
* @note This method performs bounds checking, and will throw an
* `std::runtime_error` if n is invalid.
*
* @complexity
* Constant, by calling csv::CSVRow::get_csv::string_view()
*
*/
CSV_INLINE CSVField CSVRow::operator[](size_t n) const {
return CSVField(this->get_field(n));
}
/** Retrieve a value by its associated column name. If the column
* specified can't be round, a runtime error is thrown.
*
* @complexity
* Constant. This calls the other CSVRow::operator[]() after
* converting column names into indices using a hash table.
*
* @param[in] col_name The column to look for
*/
CSV_INLINE CSVField CSVRow::operator[](const std::string& col_name) const {
auto & col_names = this->data->col_names;
auto col_pos = col_names->index_of(col_name);
if (col_pos > -1) {
return this->operator[](col_pos);
}
throw std::runtime_error("Can't find a column named " + col_name);
}
CSV_INLINE CSVRow::operator std::vector<std::string>() const {
std::vector<std::string> ret;
for (size_t i = 0; i < size(); i++)
ret.push_back(std::string(this->get_field(i)));
return ret;
}
CSV_INLINE csv::string_view CSVRow::get_field(size_t index) const
{
using internals::ParseFlags;
if (index >= this->size())
throw std::runtime_error("Index out of bounds.");
const size_t field_index = this->fields_start + index;
auto& field = this->data->fields[field_index];
auto field_str = csv::string_view(this->data->data).substr(this->data_start + field.start);
if (field.has_double_quote) {
auto& value = this->data->double_quote_fields[field_index];
if (value.empty()) {
bool prev_ch_quote = false;
for (size_t i = 0; i < field.length; i++) {
if (this->data->parse_flags[field_str[i] + 128] == ParseFlags::QUOTE) {
if (prev_ch_quote) {
prev_ch_quote = false;
continue;
}
else {
prev_ch_quote = true;
}
}
value += field_str[i];
}
}
return csv::string_view(value);
}
return field_str.substr(0, field.length);
}
CSV_INLINE bool CSVField::try_parse_hex(int& parsedValue) {
size_t start = 0, end = 0;
// Trim out whitespace chars
for (; start < this->sv.size() && this->sv[start] == ' '; start++);
for (end = start; end < this->sv.size() && this->sv[end] != ' '; end++);
int value_ = 0;
size_t digits = (end - start);
size_t base16_exponent = digits - 1;
if (digits == 0) return false;
for (const auto& ch : this->sv.substr(start, digits)) {
int digit = 0;
switch (ch) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
digit = static_cast<int>(ch - '0');
break;
case 'a':
case 'A':
digit = 10;
break;
case 'b':
case 'B':
digit = 11;
break;
case 'c':
case 'C':
digit = 12;
break;
case 'd':
case 'D':
digit = 13;
break;
case 'e':
case 'E':
digit = 14;
break;
case 'f':
case 'F':
digit = 15;
break;
default:
return false;
}
value_ += digit * (int)pow(16, (double)base16_exponent);
base16_exponent--;
}
parsedValue = value_;
return true;
}
CSV_INLINE bool CSVField::try_parse_decimal(long double& dVal, const char decimalSymbol) {
// If field has already been parsed to empty, no need to do it aagin:
if (this->_type == DataType::CSV_NULL)
return false;
// Not yet parsed or possibly parsed with other decimalSymbol
if (this->_type == DataType::UNKNOWN || this->_type == DataType::CSV_STRING || this->_type == DataType::CSV_DOUBLE)
this->_type = internals::data_type(this->sv, &this->value, decimalSymbol); // parse again
// Integral types are not affected by decimalSymbol and need not be parsed again
// Either we already had an integral type before, or we we just got any numeric type now.
if (this->_type >= DataType::CSV_INT8 && this->_type <= DataType::CSV_DOUBLE) {
dVal = this->value;
return true;
}
// CSV_NULL or CSV_STRING, not numeric
return false;
}
#ifdef _MSC_VER
#pragma region CSVRow Iterator
#endif
/** Return an iterator pointing to the first field. */
CSV_INLINE CSVRow::iterator CSVRow::begin() const {
return CSVRow::iterator(this, 0);
}
/** Return an iterator pointing to just after the end of the CSVRow.
*
* @warning Attempting to dereference the end iterator results
* in dereferencing a null pointer.
*/
CSV_INLINE CSVRow::iterator CSVRow::end() const noexcept {
return CSVRow::iterator(this, (int)this->size());
}
CSV_INLINE CSVRow::reverse_iterator CSVRow::rbegin() const noexcept {
return std::reverse_iterator<CSVRow::iterator>(this->end());
}
CSV_INLINE CSVRow::reverse_iterator CSVRow::rend() const {
return std::reverse_iterator<CSVRow::iterator>(this->begin());
}
CSV_INLINE HEDLEY_NON_NULL(2)
CSVRow::iterator::iterator(const CSVRow* _reader, int _i)
: daddy(_reader), i(_i) {
if (_i < (int)this->daddy->size())
this->field = std::make_shared<CSVField>(
this->daddy->operator[](_i));
else
this->field = nullptr;
}
CSV_INLINE CSVRow::iterator::reference CSVRow::iterator::operator*() const {
return *(this->field.get());
}
CSV_INLINE CSVRow::iterator::pointer CSVRow::iterator::operator->() const {
return this->field;
}
CSV_INLINE CSVRow::iterator& CSVRow::iterator::operator++() {
// Pre-increment operator
this->i++;
if (this->i < (int)this->daddy->size())
this->field = std::make_shared<CSVField>(
this->daddy->operator[](i));
else // Reached the end of row
this->field = nullptr;
return *this;
}
CSV_INLINE CSVRow::iterator CSVRow::iterator::operator++(int) {
// Post-increment operator
auto temp = *this;
this->operator++();
return temp;
}
CSV_INLINE CSVRow::iterator& CSVRow::iterator::operator--() {
// Pre-decrement operator
this->i--;
this->field = std::make_shared<CSVField>(
this->daddy->operator[](this->i));
return *this;
}
CSV_INLINE CSVRow::iterator CSVRow::iterator::operator--(int) {
// Post-decrement operator
auto temp = *this;
this->operator--();
return temp;
}
CSV_INLINE CSVRow::iterator CSVRow::iterator::operator+(difference_type n) const {
// Allows for iterator arithmetic
return CSVRow::iterator(this->daddy, i + (int)n);
}
CSV_INLINE CSVRow::iterator CSVRow::iterator::operator-(difference_type n) const {
// Allows for iterator arithmetic
return CSVRow::iterator::operator+(-n);
}
#ifdef _MSC_VER
#pragma endregion CSVRow Iterator
#endif
}

View File

@ -0,0 +1,465 @@
/** @file
* Defines the data type used for storing information about a CSV row
*/
#pragma once
#include <cmath>
#include <deque>
#include <iterator>
#include <memory> // For CSVField
#include <limits> // For CSVField
#include <unordered_map>
#include <unordered_set>
#include <string>
#include <sstream>
#include <vector>
#include "common.hpp"
#include "data_type.hpp"
#include "col_names.hpp"
namespace csv {
namespace internals {
class IBasicCSVParser;
static const std::string ERROR_NAN = "Not a number.";
static const std::string ERROR_OVERFLOW = "Overflow error.";
static const std::string ERROR_FLOAT_TO_INT =
"Attempted to convert a floating point value to an integral type.";
static const std::string ERROR_NEG_TO_UNSIGNED = "Negative numbers cannot be converted to unsigned types.";
std::string json_escape_string(csv::string_view s) noexcept;
/** A barebones class used for describing CSV fields */
struct RawCSVField {
RawCSVField() = default;
RawCSVField(size_t _start, size_t _length, bool _double_quote = false) {
start = _start;
length = _length;
has_double_quote = _double_quote;
}
/** The start of the field, relative to the beginning of the row */
size_t start;
/** The length of the row, ignoring quote escape characters */
size_t length;
/** Whether or not the field contains an escaped quote */
bool has_double_quote;
};
/** A class used for efficiently storing RawCSVField objects and expanding as necessary
*
* @par Implementation
* This data structure stores RawCSVField in continguous blocks. When more capacity
* is needed, a new block is allocated, but previous data stays put.
*
* @par Thread Safety
* This class may be safely read from multiple threads and written to from one,
* as long as the writing thread does not actively touch fields which are being
* read.
*/
class CSVFieldList {
public:
/** Construct a CSVFieldList which allocates blocks of a certain size */
CSVFieldList(size_t single_buffer_capacity = (size_t)(internals::PAGE_SIZE / sizeof(RawCSVField))) :
_single_buffer_capacity(single_buffer_capacity) {
this->allocate();
}
// No copy constructor
CSVFieldList(const CSVFieldList& other) = delete;
// CSVFieldArrays may be moved
CSVFieldList(CSVFieldList&& other) :
_single_buffer_capacity(other._single_buffer_capacity) {
for (auto&& buffer : other.buffers) {
this->buffers.emplace_back(std::move(buffer));
}
_current_buffer_size = other._current_buffer_size;
_back = other._back;
}
template <class... Args>
void emplace_back(Args&&... args) {
if (this->_current_buffer_size == this->_single_buffer_capacity) {
this->allocate();
}
*(_back++) = RawCSVField(std::forward<Args>(args)...);
_current_buffer_size++;
}
size_t size() const noexcept {
return this->_current_buffer_size + ((this->buffers.size() - 1) * this->_single_buffer_capacity);
}
RawCSVField& operator[](size_t n) const;
private:
const size_t _single_buffer_capacity;
/**
* Prefer std::deque over std::vector because it does not
* reallocate upon expansion, allowing pointers to its members
* to remain valid & avoiding potential race conditions when
* CSVFieldList is accesssed simulatenously by a reading thread and
* a writing thread
*/
std::deque<std::unique_ptr<RawCSVField[]>> buffers = {};
/** Number of items in the current buffer */
size_t _current_buffer_size = 0;
/** Pointer to the current empty field */
RawCSVField* _back = nullptr;
/** Allocate a new page of memory */
void allocate();
};
/** A class for storing raw CSV data and associated metadata */
struct RawCSVData {
std::shared_ptr<void> _data = nullptr;
csv::string_view data = "";
internals::CSVFieldList fields;
std::unordered_set<size_t> has_double_quotes = {};
// TODO: Consider replacing with a more thread-safe structure
std::unordered_map<size_t, std::string> double_quote_fields = {};
internals::ColNamesPtr col_names = nullptr;
internals::ParseFlagMap parse_flags;
internals::WhitespaceMap ws_flags;
};
using RawCSVDataPtr = std::shared_ptr<RawCSVData>;
}
/**
* @class CSVField
* @brief Data type representing individual CSV values.
* CSVFields can be obtained by using CSVRow::operator[]
*/
class CSVField {
public:
/** Constructs a CSVField from a string_view */
constexpr explicit CSVField(csv::string_view _sv) noexcept : sv(_sv) { };
operator std::string() const {
return std::string("<CSVField> ") + std::string(this->sv);
}
/** Returns the value casted to the requested type, performing type checking before.
*
* \par Valid options for T
* - std::string or csv::string_view
* - signed integral types (signed char, short, int, long int, long long int)
* - floating point types (float, double, long double)
* - unsigned integers are not supported at this time, but may be in a later release
*
* \par Invalid conversions
* - Converting non-numeric values to any numeric type
* - Converting floating point values to integers
* - Converting a large integer to a smaller type that will not hold it
*
* @note This method is capable of parsing scientific E-notation.
* See [this page](md_docs_source_scientific_notation.html)
* for more details.
*
* @throws std::runtime_error Thrown if an invalid conversion is performed.
*
* @warning Currently, conversions to floating point types are not
* checked for loss of precision
*
* @warning Any string_views returned are only guaranteed to be valid
* if the parent CSVRow is still alive. If you are concerned
* about object lifetimes, then grab a std::string or a
* numeric value.
*
*/
template<typename T = std::string> T get() {
IF_CONSTEXPR(std::is_arithmetic<T>::value) {
// Note: this->type() also converts the CSV value to float
if (this->type() <= DataType::CSV_STRING) {
throw std::runtime_error(internals::ERROR_NAN);
}
}
IF_CONSTEXPR(std::is_integral<T>::value) {
// Note: this->is_float() also converts the CSV value to float
if (this->is_float()) {
throw std::runtime_error(internals::ERROR_FLOAT_TO_INT);
}
IF_CONSTEXPR(std::is_unsigned<T>::value) {
if (this->value < 0) {
throw std::runtime_error(internals::ERROR_NEG_TO_UNSIGNED);
}
}
}
// Allow fallthrough from previous if branch
IF_CONSTEXPR(!std::is_floating_point<T>::value) {
IF_CONSTEXPR(std::is_unsigned<T>::value) {
// Quick hack to perform correct unsigned integer boundary checks
if (this->value > internals::get_uint_max<sizeof(T)>()) {
throw std::runtime_error(internals::ERROR_OVERFLOW);
}
}
else if (internals::type_num<T>() < this->_type) {
throw std::runtime_error(internals::ERROR_OVERFLOW);
}
}
return static_cast<T>(this->value);
}
/** Parse a hexadecimal value, returning false if the value is not hex. */
bool try_parse_hex(int& parsedValue);
/** Attempts to parse a decimal (or integer) value using the given symbol,
* returning `true` if the value is numeric.
*
* @note This method also updates this field's type
*
*/
bool try_parse_decimal(long double& dVal, const char decimalSymbol = '.');
/** Compares the contents of this field to a numeric value. If this
* field does not contain a numeric value, then all comparisons return
* false.
*
* @note Floating point values are considered equal if they are within
* `0.000001` of each other.
*
* @warning Multiple numeric comparisons involving the same field can
* be done more efficiently by calling the CSVField::get<>() method.
*
* @sa csv::CSVField::operator==(const char * other)
* @sa csv::CSVField::operator==(csv::string_view other)
*/
template<typename T>
CONSTEXPR_14 bool operator==(T other) const noexcept
{
static_assert(std::is_arithmetic<T>::value,
"T should be a numeric value.");
if (this->_type != DataType::UNKNOWN) {
if (this->_type == DataType::CSV_STRING) {
return false;
}
return internals::is_equal(value, static_cast<long double>(other), 0.000001L);
}
long double out = 0;
if (internals::data_type(this->sv, &out) == DataType::CSV_STRING) {
return false;
}
return internals::is_equal(out, static_cast<long double>(other), 0.000001L);
}
/** Return a string view over the field's contents */
CONSTEXPR csv::string_view get_sv() const noexcept { return this->sv; }
/** Returns true if field is an empty string or string of whitespace characters */
CONSTEXPR_14 bool is_null() noexcept { return type() == DataType::CSV_NULL; }
/** Returns true if field is a non-numeric, non-empty string */
CONSTEXPR_14 bool is_str() noexcept { return type() == DataType::CSV_STRING; }
/** Returns true if field is an integer or float */
CONSTEXPR_14 bool is_num() noexcept { return type() >= DataType::CSV_INT8; }
/** Returns true if field is an integer */
CONSTEXPR_14 bool is_int() noexcept {
return (type() >= DataType::CSV_INT8) && (type() <= DataType::CSV_INT64);
}
/** Returns true if field is a floating point value */
CONSTEXPR_14 bool is_float() noexcept { return type() == DataType::CSV_DOUBLE; };
/** Return the type of the underlying CSV data */
CONSTEXPR_14 DataType type() noexcept {
this->get_value();
return _type;
}
private:
long double value = 0; /**< Cached numeric value */
csv::string_view sv = ""; /**< A pointer to this field's text */
DataType _type = DataType::UNKNOWN; /**< Cached data type value */
CONSTEXPR_14 void get_value() noexcept {
/* Check to see if value has been cached previously, if not
* evaluate it
*/
if ((int)_type < 0) {
this->_type = internals::data_type(this->sv, &this->value);
}
}
};
/** Data structure for representing CSV rows */
class CSVRow {
public:
friend internals::IBasicCSVParser;
CSVRow() = default;
/** Construct a CSVRow from a RawCSVDataPtr */
CSVRow(internals::RawCSVDataPtr _data) : data(_data) {}
CSVRow(internals::RawCSVDataPtr _data, size_t _data_start, size_t _field_bounds)
: data(_data), data_start(_data_start), fields_start(_field_bounds) {}
/** Indicates whether row is empty or not */
CONSTEXPR bool empty() const noexcept { return this->size() == 0; }
/** Return the number of fields in this row */
CONSTEXPR size_t size() const noexcept { return row_length; }
/** @name Value Retrieval */
///@{
CSVField operator[](size_t n) const;
CSVField operator[](const std::string&) const;
std::string to_json(const std::vector<std::string>& subset = {}) const;
std::string to_json_array(const std::vector<std::string>& subset = {}) const;
/** Retrieve this row's associated column names */
std::vector<std::string> get_col_names() const {
return this->data->col_names->get_col_names();
}
/** Convert this CSVRow into a vector of strings.
* **Note**: This is a less efficient method of
* accessing data than using the [] operator.
*/
operator std::vector<std::string>() const;
///@}
/** A random access iterator over the contents of a CSV row.
* Each iterator points to a CSVField.
*/
class iterator {
public:
#ifndef DOXYGEN_SHOULD_SKIP_THIS
using value_type = CSVField;
using difference_type = int;
using pointer = std::shared_ptr<CSVField>;
using reference = CSVField & ;
using iterator_category = std::random_access_iterator_tag;
#endif
iterator(const CSVRow*, int i);
reference operator*() const;
pointer operator->() const;
iterator operator++(int);
iterator& operator++();
iterator operator--(int);
iterator& operator--();
iterator operator+(difference_type n) const;
iterator operator-(difference_type n) const;
/** Two iterators are equal if they point to the same field */
CONSTEXPR bool operator==(const iterator& other) const noexcept {
return this->i == other.i;
};
CONSTEXPR bool operator!=(const iterator& other) const noexcept { return !operator==(other); }
#ifndef NDEBUG
friend CSVRow;
#endif
private:
const CSVRow * daddy = nullptr; // Pointer to parent
std::shared_ptr<CSVField> field = nullptr; // Current field pointed at
int i = 0; // Index of current field
};
/** A reverse iterator over the contents of a CSVRow. */
using reverse_iterator = std::reverse_iterator<iterator>;
/** @name Iterators
* @brief Each iterator points to a CSVField object.
*/
///@{
iterator begin() const;
iterator end() const noexcept;
reverse_iterator rbegin() const noexcept;
reverse_iterator rend() const;
///@}
private:
/** Retrieve a string view corresponding to the specified index */
csv::string_view get_field(size_t index) const;
internals::RawCSVDataPtr data;
/** Where in RawCSVData.data we start */
size_t data_start = 0;
/** Where in the RawCSVDataPtr.fields array we start */
size_t fields_start = 0;
/** How many columns this row spans */
size_t row_length = 0;
};
#ifdef _MSC_VER
#pragma region CSVField::get Specializations
#endif
/** Retrieve this field's original string */
template<>
inline std::string CSVField::get<std::string>() {
return std::string(this->sv);
}
/** Retrieve a view over this field's string
*
* @warning This string_view is only guaranteed to be valid as long as this
* CSVRow is still alive.
*/
template<>
CONSTEXPR_14 csv::string_view CSVField::get<csv::string_view>() {
return this->sv;
}
/** Retrieve this field's value as a long double */
template<>
CONSTEXPR_14 long double CSVField::get<long double>() {
if (!is_num())
throw std::runtime_error(internals::ERROR_NAN);
return this->value;
}
#ifdef _MSC_VER
#pragma endregion CSVField::get Specializations
#endif
/** Compares the contents of this field to a string */
template<>
CONSTEXPR bool CSVField::operator==(const char * other) const noexcept
{
return this->sv == other;
}
/** Compares the contents of this field to a string */
template<>
CONSTEXPR bool CSVField::operator==(csv::string_view other) const noexcept
{
return this->sv == other;
}
}
inline std::ostream& operator << (std::ostream& os, csv::CSVField const& value) {
os << std::string(value);
return os;
}

View File

@ -0,0 +1,262 @@
/** @file
* Implements JSON serialization abilities
*/
#include "csv_row.hpp"
namespace csv {
/*
The implementations for json_extra_space() and json_escape_string()
were modified from source code for JSON for Modern C++.
The respective license is below:
The code is licensed under the [MIT
License](http://opensource.org/licenses/MIT):
Copyright &copy; 2013-2015 Niels Lohmann.
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/
namespace internals {
/*!
@brief calculates the extra space to escape a JSON string
@param[in] s the string to escape
@return the number of characters required to escape string @a s
@complexity Linear in the length of string @a s.
*/
static std::size_t json_extra_space(csv::string_view& s) noexcept
{
std::size_t result = 0;
for (const auto& c : s)
{
switch (c)
{
case '"':
case '\\':
case '\b':
case '\f':
case '\n':
case '\r':
case '\t':
{
// from c (1 byte) to \x (2 bytes)
result += 1;
break;
}
default:
{
if (c >= 0x00 && c <= 0x1f)
{
// from c (1 byte) to \uxxxx (6 bytes)
result += 5;
}
break;
}
}
}
return result;
}
CSV_INLINE std::string json_escape_string(csv::string_view s) noexcept
{
const auto space = json_extra_space(s);
if (space == 0)
{
return std::string(s);
}
// create a result string of necessary size
size_t result_size = s.size() + space;
std::string result(result_size, '\\');
std::size_t pos = 0;
for (const auto& c : s)
{
switch (c)
{
// quotation mark (0x22)
case '"':
{
result[pos + 1] = '"';
pos += 2;
break;
}
// reverse solidus (0x5c)
case '\\':
{
// nothing to change
pos += 2;
break;
}
// backspace (0x08)
case '\b':
{
result[pos + 1] = 'b';
pos += 2;
break;
}
// formfeed (0x0c)
case '\f':
{
result[pos + 1] = 'f';
pos += 2;
break;
}
// newline (0x0a)
case '\n':
{
result[pos + 1] = 'n';
pos += 2;
break;
}
// carriage return (0x0d)
case '\r':
{
result[pos + 1] = 'r';
pos += 2;
break;
}
// horizontal tab (0x09)
case '\t':
{
result[pos + 1] = 't';
pos += 2;
break;
}
default:
{
if (c >= 0x00 && c <= 0x1f)
{
// print character c as \uxxxx
snprintf(&result[pos + 1], result_size - pos - 1, "u%04x", int(c));
pos += 6;
// overwrite trailing null character
result[pos] = '\\';
}
else
{
// all other characters are added as-is
result[pos++] = c;
}
break;
}
}
}
return result;
}
}
/** Convert a CSV row to a JSON object, i.e.
* `{"col1":"value1","col2":"value2"}`
*
* @note All strings are properly escaped. Numeric values are not quoted.
* @param[in] subset A subset of columns to contain in the JSON.
* Leave empty for original columns.
*/
CSV_INLINE std::string CSVRow::to_json(const std::vector<std::string>& subset) const {
std::vector<std::string> col_names = subset;
if (subset.empty()) {
col_names = this->data ? this->get_col_names() : std::vector<std::string>({});
}
const size_t _n_cols = col_names.size();
std::string ret = "{";
for (size_t i = 0; i < _n_cols; i++) {
auto& col = col_names[i];
auto field = this->operator[](col);
// TODO: Possible performance enhancements by caching escaped column names
ret += '"' + internals::json_escape_string(col) + "\":";
// Add quotes around strings but not numbers
if (field.is_num())
ret += internals::json_escape_string(field.get<csv::string_view>());
else
ret += '"' + internals::json_escape_string(field.get<csv::string_view>()) + '"';
// Do not add comma after last string
if (i + 1 < _n_cols)
ret += ',';
}
ret += '}';
return ret;
}
/** Convert a CSV row to a JSON array, i.e.
* `["value1","value2",...]`
*
* @note All strings are properly escaped. Numeric values are not quoted.
* @param[in] subset A subset of columns to contain in the JSON.
* Leave empty for all columns.
*/
CSV_INLINE std::string CSVRow::to_json_array(const std::vector<std::string>& subset) const {
std::vector<std::string> col_names = subset;
if (subset.empty())
col_names = this->data ? this->get_col_names() : std::vector<std::string>({});
const size_t _n_cols = col_names.size();
std::string ret = "[";
for (size_t i = 0; i < _n_cols; i++) {
auto field = this->operator[](col_names[i]);
// Add quotes around strings but not numbers
if (field.is_num())
ret += internals::json_escape_string(field.get<csv::string_view>());
else
ret += '"' + internals::json_escape_string(field.get<csv::string_view>()) + '"';
// Do not add comma after last string
if (i + 1 < _n_cols)
ret += ',';
}
ret += ']';
return ret;
}
}

View File

@ -0,0 +1,267 @@
/** @file
* Calculates statistics from CSV files
*/
#include <string>
#include "csv_stat.hpp"
namespace csv {
/** Calculate statistics for an arbitrarily large file. When this constructor
* is called, CSVStat will process the entire file iteratively. Once finished,
* methods like get_mean(), get_counts(), etc... can be used to retrieve statistics.
*/
CSV_INLINE CSVStat::CSVStat(csv::string_view filename, CSVFormat format) :
reader(filename, format) {
this->calc();
}
/** Calculate statistics for a CSV stored in a std::stringstream */
CSV_INLINE CSVStat::CSVStat(std::stringstream& stream, CSVFormat format) :
reader(stream, format) {
this->calc();
}
/** Return current means */
CSV_INLINE std::vector<long double> CSVStat::get_mean() const {
std::vector<long double> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->rolling_means[i]);
}
return ret;
}
/** Return current variances */
CSV_INLINE std::vector<long double> CSVStat::get_variance() const {
std::vector<long double> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->rolling_vars[i]/(this->n[i] - 1));
}
return ret;
}
/** Return current mins */
CSV_INLINE std::vector<long double> CSVStat::get_mins() const {
std::vector<long double> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->mins[i]);
}
return ret;
}
/** Return current maxes */
CSV_INLINE std::vector<long double> CSVStat::get_maxes() const {
std::vector<long double> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->maxes[i]);
}
return ret;
}
/** Get counts for each column */
CSV_INLINE std::vector<CSVStat::FreqCount> CSVStat::get_counts() const {
std::vector<FreqCount> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->counts[i]);
}
return ret;
}
/** Get data type counts for each column */
CSV_INLINE std::vector<CSVStat::TypeCount> CSVStat::get_dtypes() const {
std::vector<TypeCount> ret;
for (size_t i = 0; i < this->get_col_names().size(); i++) {
ret.push_back(this->dtypes[i]);
}
return ret;
}
CSV_INLINE void CSVStat::calc_chunk() {
/** Only create stats counters the first time **/
if (dtypes.empty()) {
/** Go through all records and calculate specified statistics */
for (size_t i = 0; i < this->get_col_names().size(); i++) {
dtypes.push_back({});
counts.push_back({});
rolling_means.push_back(0);
rolling_vars.push_back(0);
mins.push_back(NAN);
maxes.push_back(NAN);
n.push_back(0);
}
}
// Start threads
std::vector<std::thread> pool;
for (size_t i = 0; i < this->get_col_names().size(); i++)
pool.push_back(std::thread(&CSVStat::calc_worker, this, i));
// Block until done
for (auto& th : pool)
th.join();
this->records.clear();
}
CSV_INLINE void CSVStat::calc() {
constexpr size_t CALC_CHUNK_SIZE = 5000;
for (auto& row : reader) {
this->records.push_back(std::move(row));
/** Chunk rows */
if (this->records.size() == CALC_CHUNK_SIZE) {
calc_chunk();
}
}
if (!this->records.empty()) {
calc_chunk();
}
}
CSV_INLINE void CSVStat::calc_worker(const size_t &i) {
/** Worker thread for CSVStat::calc() which calculates statistics for one column.
*
* @param[in] i Column index
*/
auto current_record = this->records.begin();
for (size_t processed = 0; current_record != this->records.end(); processed++) {
if (current_record->size() == this->get_col_names().size()) {
auto current_field = (*current_record)[i];
// Optimization: Don't count() if there's too many distinct values in the first 1000 rows
if (processed < 1000 || this->counts[i].size() <= 500)
this->count(current_field, i);
this->dtype(current_field, i);
// Numeric Stuff
if (current_field.is_num()) {
long double x_n = current_field.get<long double>();
// This actually calculates mean AND variance
this->variance(x_n, i);
this->min_max(x_n, i);
}
}
else if (this->reader.get_format().get_variable_column_policy() == VariableColumnPolicy::THROW) {
throw std::runtime_error("Line has different length than the others " + internals::format_row(*current_record));
}
++current_record;
}
}
CSV_INLINE void CSVStat::dtype(CSVField& data, const size_t &i) {
/** Given a record update the type counter
* @param[in] record Data observation
* @param[out] i The column index that should be updated
*/
auto type = data.type();
if (this->dtypes[i].find(type) !=
this->dtypes[i].end()) {
// Increment count
this->dtypes[i][type]++;
} else {
// Initialize count
this->dtypes[i].insert(std::make_pair(type, 1));
}
}
CSV_INLINE void CSVStat::count(CSVField& data, const size_t &i) {
/** Given a record update the frequency counter
* @param[in] record Data observation
* @param[out] i The column index that should be updated
*/
auto item = data.get<std::string>();
if (this->counts[i].find(item) !=
this->counts[i].end()) {
// Increment count
this->counts[i][item]++;
} else {
// Initialize count
this->counts[i].insert(std::make_pair(item, 1));
}
}
CSV_INLINE void CSVStat::min_max(const long double &x_n, const size_t &i) {
/** Update current minimum and maximum
* @param[in] x_n Data observation
* @param[out] i The column index that should be updated
*/
if (std::isnan(this->mins[i]))
this->mins[i] = x_n;
if (std::isnan(this->maxes[i]))
this->maxes[i] = x_n;
if (x_n < this->mins[i])
this->mins[i] = x_n;
else if (x_n > this->maxes[i])
this->maxes[i] = x_n;
}
CSV_INLINE void CSVStat::variance(const long double &x_n, const size_t &i) {
/** Given a record update rolling mean and variance for all columns
* using Welford's Algorithm
* @param[in] x_n Data observation
* @param[out] i The column index that should be updated
*/
long double& current_rolling_mean = this->rolling_means[i];
long double& current_rolling_var = this->rolling_vars[i];
long double& current_n = this->n[i];
long double delta;
long double delta2;
current_n++;
if (current_n == 1) {
current_rolling_mean = x_n;
} else {
delta = x_n - current_rolling_mean;
current_rolling_mean += delta/current_n;
delta2 = x_n - current_rolling_mean;
current_rolling_var += delta*delta2;
}
}
/** Useful for uploading CSV files to SQL databases.
*
* Return a data type for each column such that every value in a column can be
* converted to the corresponding data type without data loss.
* @param[in] filename The CSV file
*
* \return A mapping of column names to csv::DataType enums
*/
CSV_INLINE std::unordered_map<std::string, DataType> csv_data_types(const std::string& filename) {
CSVStat stat(filename);
std::unordered_map<std::string, DataType> csv_dtypes;
auto col_names = stat.get_col_names();
auto temp = stat.get_dtypes();
for (size_t i = 0; i < stat.get_col_names().size(); i++) {
auto& col = temp[i];
auto& col_name = col_names[i];
if (col[DataType::CSV_STRING])
csv_dtypes[col_name] = DataType::CSV_STRING;
else if (col[DataType::CSV_INT64])
csv_dtypes[col_name] = DataType::CSV_INT64;
else if (col[DataType::CSV_INT32])
csv_dtypes[col_name] = DataType::CSV_INT32;
else if (col[DataType::CSV_INT16])
csv_dtypes[col_name] = DataType::CSV_INT16;
else if (col[DataType::CSV_INT8])
csv_dtypes[col_name] = DataType::CSV_INT8;
else
csv_dtypes[col_name] = DataType::CSV_DOUBLE;
}
return csv_dtypes;
}
}

View File

@ -0,0 +1,60 @@
/** @file
* Calculates statistics from CSV files
*/
#pragma once
#include <unordered_map>
#include <sstream>
#include <vector>
#include "csv_reader.hpp"
namespace csv {
/** Class for calculating statistics from CSV files and in-memory sources
*
* **Example**
* \include programs/csv_stats.cpp
*
*/
class CSVStat {
public:
using FreqCount = std::unordered_map<std::string, size_t>;
using TypeCount = std::unordered_map<DataType, size_t>;
std::vector<long double> get_mean() const;
std::vector<long double> get_variance() const;
std::vector<long double> get_mins() const;
std::vector<long double> get_maxes() const;
std::vector<FreqCount> get_counts() const;
std::vector<TypeCount> get_dtypes() const;
std::vector<std::string> get_col_names() const {
return this->reader.get_col_names();
}
CSVStat(csv::string_view filename, CSVFormat format = CSVFormat::guess_csv());
CSVStat(std::stringstream& source, CSVFormat format = CSVFormat());
private:
// An array of rolling averages
// Each index corresponds to the rolling mean for the column at said index
std::vector<long double> rolling_means;
std::vector<long double> rolling_vars;
std::vector<long double> mins;
std::vector<long double> maxes;
std::vector<FreqCount> counts;
std::vector<TypeCount> dtypes;
std::vector<long double> n;
// Statistic calculators
void variance(const long double&, const size_t&);
void count(CSVField&, const size_t&);
void min_max(const long double&, const size_t&);
void dtype(CSVField&, const size_t&);
void calc();
void calc_chunk();
void calc_worker(const size_t&);
CSVReader reader;
std::deque<CSVRow> records = {};
};
}

View File

@ -0,0 +1,79 @@
#include <sstream>
#include <vector>
#include "csv_utility.hpp"
namespace csv {
/** Shorthand function for parsing an in-memory CSV string
*
* @return A collection of CSVRow objects
*
* @par Example
* @snippet tests/test_read_csv.cpp Parse Example
*/
CSV_INLINE CSVReader parse(csv::string_view in, CSVFormat format) {
std::stringstream stream(in.data());
return CSVReader(stream, format);
}
/** Parses a CSV string with no headers
*
* @return A collection of CSVRow objects
*/
CSV_INLINE CSVReader parse_no_header(csv::string_view in) {
CSVFormat format;
format.header_row(-1);
return parse(in, format);
}
/** Parse a RFC 4180 CSV string, returning a collection
* of CSVRow objects
*
* @par Example
* @snippet tests/test_read_csv.cpp Escaped Comma
*
*/
CSV_INLINE CSVReader operator ""_csv(const char* in, size_t n) {
return parse(csv::string_view(in, n));
}
/** A shorthand for csv::parse_no_header() */
CSV_INLINE CSVReader operator ""_csv_no_header(const char* in, size_t n) {
return parse_no_header(csv::string_view(in, n));
}
/**
* Find the position of a column in a CSV file or CSV_NOT_FOUND otherwise
*
* @param[in] filename Path to CSV file
* @param[in] col_name Column whose position we should resolve
* @param[in] format Format of the CSV file
*/
CSV_INLINE int get_col_pos(
csv::string_view filename,
csv::string_view col_name,
const CSVFormat& format) {
CSVReader reader(filename, format);
return reader.index_of(col_name);
}
/** Get basic information about a CSV file
* @include programs/csv_info.cpp
*/
CSV_INLINE CSVFileInfo get_file_info(const std::string& filename) {
CSVReader reader(filename);
CSVFormat format = reader.get_format();
for (auto it = reader.begin(); it != reader.end(); ++it);
CSVFileInfo info = {
filename,
reader.get_col_names(),
format.get_delim(),
reader.n_rows(),
reader.get_col_names().size()
};
return info;
}
}

View File

@ -0,0 +1,38 @@
#pragma once
#include "common.hpp"
#include "csv_format.hpp"
#include "csv_reader.hpp"
#include "data_type.hpp"
#include <string>
#include <type_traits>
#include <unordered_map>
namespace csv {
/** Returned by get_file_info() */
struct CSVFileInfo {
std::string filename; /**< Filename */
std::vector<std::string> col_names; /**< CSV column names */
char delim; /**< Delimiting character */
size_t n_rows; /**< Number of rows in a file */
size_t n_cols; /**< Number of columns in a CSV */
};
/** @name Shorthand Parsing Functions
* @brief Convienience functions for parsing small strings
*/
///@{
CSVReader operator ""_csv(const char*, size_t);
CSVReader operator ""_csv_no_header(const char*, size_t);
CSVReader parse(csv::string_view in, CSVFormat format = CSVFormat());
CSVReader parse_no_header(csv::string_view in);
///@}
/** @name Utility Functions */
///@{
std::unordered_map<std::string, DataType> csv_data_types(const std::string&);
CSVFileInfo get_file_info(const std::string& filename);
int get_col_pos(csv::string_view filename, csv::string_view col_name,
const CSVFormat& format = CSVFormat::guess_csv());
///@}
}

View File

@ -0,0 +1,412 @@
/** @file
* A standalone header file for writing delimiter-separated files
*/
#pragma once
#include <fstream>
#include <iostream>
#include <string>
#include <tuple>
#include <type_traits>
#include <vector>
#include "common.hpp"
#include "data_type.hpp"
namespace csv {
namespace internals {
static int DECIMAL_PLACES = 5;
/**
* Calculate the absolute value of a number
*/
template<typename T = int>
inline T csv_abs(T x) {
return abs(x);
}
template<>
inline int csv_abs(int x) {
return abs(x);
}
template<>
inline long int csv_abs(long int x) {
return labs(x);
}
template<>
inline long long int csv_abs(long long int x) {
return llabs(x);
}
template<>
inline float csv_abs(float x) {
return fabsf(x);
}
template<>
inline double csv_abs(double x) {
return fabs(x);
}
template<>
inline long double csv_abs(long double x) {
return fabsl(x);
}
/**
* Calculate the number of digits in a number
*/
template<
typename T,
csv::enable_if_t<std::is_arithmetic<T>::value, int> = 0
>
int num_digits(T x)
{
x = csv_abs(x);
int digits = 0;
while (x >= 1) {
x /= 10;
digits++;
}
return digits;
}
/** to_string() for unsigned integers */
template<typename T,
csv::enable_if_t<std::is_unsigned<T>::value, int> = 0>
inline std::string to_string(T value) {
std::string digits_reverse = "";
if (value == 0) return "0";
while (value > 0) {
digits_reverse += (char)('0' + (value % 10));
value /= 10;
}
return std::string(digits_reverse.rbegin(), digits_reverse.rend());
}
/** to_string() for signed integers */
template<
typename T,
csv::enable_if_t<std::is_integral<T>::value && std::is_signed<T>::value, int> = 0
>
inline std::string to_string(T value) {
if (value >= 0)
return to_string((size_t)value);
return "-" + to_string((size_t)(value * -1));
}
/** to_string() for floating point numbers */
template<
typename T,
csv::enable_if_t<std::is_floating_point<T>::value, int> = 0
>
inline std::string to_string(T value) {
#ifdef __clang__
return std::to_string(value);
#else
// TODO: Figure out why the below code doesn't work on clang
std::string result = "";
T integral_part;
T fractional_part = std::abs(std::modf(value, &integral_part));
integral_part = std::abs(integral_part);
// Integral part
if (value < 0) result = "-";
if (integral_part == 0) {
result += "0";
}
else {
for (int n_digits = num_digits(integral_part); n_digits > 0; n_digits --) {
int digit = (int)(std::fmod(integral_part, pow10(n_digits)) / pow10(n_digits - 1));
result += (char)('0' + digit);
}
}
// Decimal part
result += ".";
if (fractional_part > 0) {
fractional_part *= (T)(pow10(DECIMAL_PLACES));
for (int n_digits = DECIMAL_PLACES; n_digits > 0; n_digits--) {
int digit = (int)(std::fmod(fractional_part, pow10(n_digits)) / pow10(n_digits - 1));
result += (char)('0' + digit);
}
}
else {
result += "0";
}
return result;
#endif
}
}
/** Sets how many places after the decimal will be written for floating point numbers
*
* @param precision Number of decimal places
*/
#ifndef __clang___
inline static void set_decimal_places(int precision) {
internals::DECIMAL_PLACES = precision;
}
#endif
/** @name CSV Writing */
///@{
/**
* Class for writing delimiter separated values files
*
* To write formatted strings, one should
* -# Initialize a DelimWriter with respect to some output stream
* -# Call write_row() on std::vector<std::string>s of unformatted text
*
* @tparam OutputStream The output stream, e.g. `std::ofstream`, `std::stringstream`
* @tparam Delim The delimiter character
* @tparam Quote The quote character
* @tparam Flush True: flush after every writing function,
* false: you need to flush explicitly if needed.
* In both cases the destructor will flush.
*
* @par Hint
* Use the aliases csv::CSVWriter<OutputStream> to write CSV
* formatted strings and csv::TSVWriter<OutputStream>
* to write tab separated strings
*
* @par Example w/ std::vector, std::deque, std::list
* @snippet test_write_csv.cpp CSV Writer Example
*
* @par Example w/ std::tuple
* @snippet test_write_csv.cpp CSV Writer Tuple Example
*/
template<class OutputStream, char Delim, char Quote, bool Flush>
class DelimWriter {
public:
/** Construct a DelimWriter over the specified output stream
*
* @param _out Stream to write to
* @param _quote_minimal Limit field quoting to only when necessary
*/
DelimWriter(OutputStream& _out, bool _quote_minimal = true)
: out(_out), quote_minimal(_quote_minimal) {};
/** Construct a DelimWriter over the file
*
* @param[out] filename File to write to
*/
DelimWriter(const std::string& filename) : DelimWriter(std::ifstream(filename)) {};
/** Destructor will flush remaining data
*
*/
~DelimWriter() {
out.flush();
}
/** Format a sequence of strings and write to CSV according to RFC 4180
*
* @warning This does not check to make sure row lengths are consistent
*
* @param[in] record Sequence of strings to be formatted
*
* @return The current DelimWriter instance (allowing for operator chaining)
*/
template<typename T, size_t Size>
DelimWriter& operator<<(const std::array<T, Size>& record) {
for (size_t i = 0; i < Size; i++) {
out << csv_escape(record[i]);
if (i + 1 != Size) out << Delim;
}
end_out();
return *this;
}
/** @copydoc operator<< */
template<typename... T>
DelimWriter& operator<<(const std::tuple<T...>& record) {
this->write_tuple<0, T...>(record);
return *this;
}
/**
* @tparam T A container such as std::vector, std::deque, or std::list
*
* @copydoc operator<<
*/
template<
typename T, typename Alloc, template <typename, typename> class Container,
// Avoid conflicting with tuples with two elements
csv::enable_if_t<std::is_class<Alloc>::value, int> = 0
>
DelimWriter& operator<<(const Container<T, Alloc>& record) {
const size_t ilen = record.size();
size_t i = 0;
for (const auto& field : record) {
out << csv_escape(field);
if (i + 1 != ilen) out << Delim;
i++;
}
end_out();
return *this;
}
/** Flushes the written data
*
*/
void flush() {
out.flush();
}
private:
template<
typename T,
csv::enable_if_t<
!std::is_convertible<T, std::string>::value
&& !std::is_convertible<T, csv::string_view>::value
, int> = 0
>
std::string csv_escape(T in) {
return internals::to_string(in);
}
template<
typename T,
csv::enable_if_t<
std::is_convertible<T, std::string>::value
|| std::is_convertible<T, csv::string_view>::value
, int> = 0
>
std::string csv_escape(T in) {
IF_CONSTEXPR(std::is_convertible<T, csv::string_view>::value) {
return _csv_escape(in);
}
return _csv_escape(std::string(in));
}
std::string _csv_escape(csv::string_view in) {
/** Format a string to be RFC 4180-compliant
* @param[in] in String to be CSV-formatted
* @param[out] quote_minimal Only quote fields if necessary.
* If False, everything is quoted.
*/
// Do we need a quote escape
bool quote_escape = false;
for (auto ch : in) {
if (ch == Quote || ch == Delim || ch == '\r' || ch == '\n') {
quote_escape = true;
break;
}
}
if (!quote_escape) {
if (quote_minimal) return std::string(in);
else {
std::string ret(1, Quote);
ret += in.data();
ret += Quote;
return ret;
}
}
// Start initial quote escape sequence
std::string ret(1, Quote);
for (auto ch: in) {
if (ch == Quote) ret += std::string(2, Quote);
else ret += ch;
}
// Finish off quote escape
ret += Quote;
return ret;
}
/** Recurisve template for writing std::tuples */
template<size_t Index = 0, typename... T>
typename std::enable_if<Index < sizeof...(T), void>::type write_tuple(const std::tuple<T...>& record) {
out << csv_escape(std::get<Index>(record));
IF_CONSTEXPR (Index + 1 < sizeof...(T)) out << Delim;
this->write_tuple<Index + 1>(record);
}
/** Base case for writing std::tuples */
template<size_t Index = 0, typename... T>
typename std::enable_if<Index == sizeof...(T), void>::type write_tuple(const std::tuple<T...>& record) {
(void)record;
end_out();
}
/** Ends a line in 'out' and flushes, if Flush is true.*/
void end_out() {
out << '\n';
IF_CONSTEXPR(Flush) out.flush();
}
OutputStream & out;
bool quote_minimal;
};
/** An alias for csv::DelimWriter for writing standard CSV files
*
* @sa csv::DelimWriter::operator<<()
*
* @note Use `csv::make_csv_writer()` to in instatiate this class over
* an actual output stream.
*/
template<class OutputStream, bool Flush = true>
using CSVWriter = DelimWriter<OutputStream, ',', '"', Flush>;
/** Class for writing tab-separated values files
*
* @sa csv::DelimWriter::write_row()
* @sa csv::DelimWriter::operator<<()
*
* @note Use `csv::make_tsv_writer()` to in instatiate this class over
* an actual output stream.
*/
template<class OutputStream, bool Flush = true>
using TSVWriter = DelimWriter<OutputStream, '\t', '"', Flush>;
/** Return a csv::CSVWriter over the output stream */
template<class OutputStream>
inline CSVWriter<OutputStream> make_csv_writer(OutputStream& out, bool quote_minimal=true) {
return CSVWriter<OutputStream>(out, quote_minimal);
}
/** Return a buffered csv::CSVWriter over the output stream (does not auto flush) */
template<class OutputStream>
inline CSVWriter<OutputStream, false> make_csv_writer_buffered(OutputStream& out, bool quote_minimal=true) {
return CSVWriter<OutputStream, false>(out, quote_minimal);
}
/** Return a csv::TSVWriter over the output stream */
template<class OutputStream>
inline TSVWriter<OutputStream> make_tsv_writer(OutputStream& out, bool quote_minimal=true) {
return TSVWriter<OutputStream>(out, quote_minimal);
}
/** Return a buffered csv::TSVWriter over the output stream (does not auto flush) */
template<class OutputStream>
inline TSVWriter<OutputStream, false> make_tsv_writer_buffered(OutputStream& out, bool quote_minimal=true) {
return TSVWriter<OutputStream, false>(out, quote_minimal);
}
///@}
}

View File

@ -0,0 +1,353 @@
/** @file
* @brief Implements data type parsing functionality
*/
#pragma once
#include <cmath>
#include <cctype>
#include <string>
#include <cassert>
#include "common.hpp"
namespace csv {
/** Enumerates the different CSV field types that are
* recognized by this library
*
* @note Overflowing integers will be stored and classified as doubles.
* @note Unlike previous releases, integer enums here are platform agnostic.
*/
enum class DataType {
UNKNOWN = -1,
CSV_NULL, /**< Empty string */
CSV_STRING, /**< Non-numeric string */
CSV_INT8, /**< 8-bit integer */
CSV_INT16, /**< 16-bit integer (short on MSVC/GCC) */
CSV_INT32, /**< 32-bit integer (int on MSVC/GCC) */
CSV_INT64, /**< 64-bit integer (long long on MSVC/GCC) */
CSV_BIGINT, /**< Value too big to fit in a 64-bit in */
CSV_DOUBLE /**< Floating point value */
};
static_assert(DataType::CSV_STRING < DataType::CSV_INT8, "String type should come before numeric types.");
static_assert(DataType::CSV_INT8 < DataType::CSV_INT64, "Smaller integer types should come before larger integer types.");
static_assert(DataType::CSV_INT64 < DataType::CSV_DOUBLE, "Integer types should come before floating point value types.");
namespace internals {
/** Compute 10 to the power of n */
template<typename T>
HEDLEY_CONST CONSTEXPR_14
long double pow10(const T& n) noexcept {
long double multiplicand = n > 0 ? 10 : 0.1,
ret = 1;
// Make all numbers positive
T iterations = n > 0 ? n : -n;
for (T i = 0; i < iterations; i++) {
ret *= multiplicand;
}
return ret;
}
/** Compute 10 to the power of n */
template<>
HEDLEY_CONST CONSTEXPR_14
long double pow10(const unsigned& n) noexcept {
long double multiplicand = n > 0 ? 10 : 0.1,
ret = 1;
for (unsigned i = 0; i < n; i++) {
ret *= multiplicand;
}
return ret;
}
#ifndef DOXYGEN_SHOULD_SKIP_THIS
/** Private site-indexed array mapping byte sizes to an integer size enum */
constexpr DataType int_type_arr[8] = {
DataType::CSV_INT8, // 1
DataType::CSV_INT16, // 2
DataType::UNKNOWN,
DataType::CSV_INT32, // 4
DataType::UNKNOWN,
DataType::UNKNOWN,
DataType::UNKNOWN,
DataType::CSV_INT64 // 8
};
template<typename T>
inline DataType type_num() {
static_assert(std::is_integral<T>::value, "T should be an integral type.");
static_assert(sizeof(T) <= 8, "Byte size must be no greater than 8.");
return int_type_arr[sizeof(T) - 1];
}
template<> inline DataType type_num<float>() { return DataType::CSV_DOUBLE; }
template<> inline DataType type_num<double>() { return DataType::CSV_DOUBLE; }
template<> inline DataType type_num<long double>() { return DataType::CSV_DOUBLE; }
template<> inline DataType type_num<std::nullptr_t>() { return DataType::CSV_NULL; }
template<> inline DataType type_num<std::string>() { return DataType::CSV_STRING; }
CONSTEXPR_14 DataType data_type(csv::string_view in, long double* const out = nullptr,
const char decimalsymbol = '.');
#endif
/** Given a byte size, return the largest number than can be stored in
* an integer of that size
*
* Note: Provides a platform-agnostic way of mapping names like "long int" to
* byte sizes
*/
template<size_t Bytes>
CONSTEXPR_14 long double get_int_max() {
static_assert(Bytes == 1 || Bytes == 2 || Bytes == 4 || Bytes == 8,
"Bytes must be a power of 2 below 8.");
IF_CONSTEXPR (sizeof(signed char) == Bytes) {
return (long double)std::numeric_limits<signed char>::max();
}
IF_CONSTEXPR (sizeof(short) == Bytes) {
return (long double)std::numeric_limits<short>::max();
}
IF_CONSTEXPR (sizeof(int) == Bytes) {
return (long double)std::numeric_limits<int>::max();
}
IF_CONSTEXPR (sizeof(long int) == Bytes) {
return (long double)std::numeric_limits<long int>::max();
}
IF_CONSTEXPR (sizeof(long long int) == Bytes) {
return (long double)std::numeric_limits<long long int>::max();
}
HEDLEY_UNREACHABLE();
}
/** Given a byte size, return the largest number than can be stored in
* an unsigned integer of that size
*/
template<size_t Bytes>
CONSTEXPR_14 long double get_uint_max() {
static_assert(Bytes == 1 || Bytes == 2 || Bytes == 4 || Bytes == 8,
"Bytes must be a power of 2 below 8.");
IF_CONSTEXPR(sizeof(unsigned char) == Bytes) {
return (long double)std::numeric_limits<unsigned char>::max();
}
IF_CONSTEXPR(sizeof(unsigned short) == Bytes) {
return (long double)std::numeric_limits<unsigned short>::max();
}
IF_CONSTEXPR(sizeof(unsigned int) == Bytes) {
return (long double)std::numeric_limits<unsigned int>::max();
}
IF_CONSTEXPR(sizeof(unsigned long int) == Bytes) {
return (long double)std::numeric_limits<unsigned long int>::max();
}
IF_CONSTEXPR(sizeof(unsigned long long int) == Bytes) {
return (long double)std::numeric_limits<unsigned long long int>::max();
}
HEDLEY_UNREACHABLE();
}
/** Largest number that can be stored in a 8-bit integer */
CONSTEXPR_VALUE_14 long double CSV_INT8_MAX = get_int_max<1>();
/** Largest number that can be stored in a 16-bit integer */
CONSTEXPR_VALUE_14 long double CSV_INT16_MAX = get_int_max<2>();
/** Largest number that can be stored in a 32-bit integer */
CONSTEXPR_VALUE_14 long double CSV_INT32_MAX = get_int_max<4>();
/** Largest number that can be stored in a 64-bit integer */
CONSTEXPR_VALUE_14 long double CSV_INT64_MAX = get_int_max<8>();
/** Largest number that can be stored in a 8-bit ungisned integer */
CONSTEXPR_VALUE_14 long double CSV_UINT8_MAX = get_uint_max<1>();
/** Largest number that can be stored in a 16-bit unsigned integer */
CONSTEXPR_VALUE_14 long double CSV_UINT16_MAX = get_uint_max<2>();
/** Largest number that can be stored in a 32-bit unsigned integer */
CONSTEXPR_VALUE_14 long double CSV_UINT32_MAX = get_uint_max<4>();
/** Largest number that can be stored in a 64-bit unsigned integer */
CONSTEXPR_VALUE_14 long double CSV_UINT64_MAX = get_uint_max<8>();
/** Given a pointer to the start of what is start of
* the exponential part of a number written (possibly) in scientific notation
* parse the exponent
*/
HEDLEY_PRIVATE CONSTEXPR_14
DataType _process_potential_exponential(
csv::string_view exponential_part,
const long double& coeff,
long double * const out) {
long double exponent = 0;
auto result = data_type(exponential_part, &exponent);
// Exponents in scientific notation should not be decimal numbers
if (result >= DataType::CSV_INT8 && result < DataType::CSV_DOUBLE) {
if (out) *out = coeff * pow10(exponent);
return DataType::CSV_DOUBLE;
}
return DataType::CSV_STRING;
}
/** Given the absolute value of an integer, determine what numeric type
* it fits in
*/
HEDLEY_PRIVATE HEDLEY_PURE CONSTEXPR_14
DataType _determine_integral_type(const long double& number) noexcept {
// We can assume number is always non-negative
assert(number >= 0);
if (number <= internals::CSV_INT8_MAX)
return DataType::CSV_INT8;
else if (number <= internals::CSV_INT16_MAX)
return DataType::CSV_INT16;
else if (number <= internals::CSV_INT32_MAX)
return DataType::CSV_INT32;
else if (number <= internals::CSV_INT64_MAX)
return DataType::CSV_INT64;
else // Conversion to long long will cause an overflow
return DataType::CSV_BIGINT;
}
/** Distinguishes numeric from other text values. Used by various
* type casting functions, like csv_parser::CSVReader::read_row()
*
* #### Rules
* - Leading and trailing whitespace ("padding") ignored
* - A string of just whitespace is NULL
*
* @param[in] in String value to be examined
* @param[out] out Pointer to long double where results of numeric parsing
* get stored
* @param[in] decimalSymbol the character separating integral and decimal part,
* defaults to '.' if omitted
*/
CONSTEXPR_14
DataType data_type(csv::string_view in, long double* const out, const char decimalSymbol) {
// Empty string --> NULL
if (in.size() == 0)
return DataType::CSV_NULL;
bool ws_allowed = true,
dot_allowed = true,
digit_allowed = true,
is_negative = false,
has_digit = false,
prob_float = false;
unsigned places_after_decimal = 0;
long double integral_part = 0,
decimal_part = 0;
for (size_t i = 0, ilen = in.size(); i < ilen; i++) {
const char& current = in[i];
switch (current) {
case ' ':
if (!ws_allowed) {
if (isdigit(in[i - 1])) {
digit_allowed = false;
ws_allowed = true;
}
else {
// Ex: '510 123 4567'
return DataType::CSV_STRING;
}
}
break;
case '+':
if (!ws_allowed) {
return DataType::CSV_STRING;
}
break;
case '-':
if (!ws_allowed) {
// Ex: '510-123-4567'
return DataType::CSV_STRING;
}
is_negative = true;
break;
// case decimalSymbol: not allowed because decimalSymbol is not a literal,
// it is handled in the default block
case 'e':
case 'E':
// Process scientific notation
if (prob_float || (i && i + 1 < ilen && isdigit(in[i - 1]))) {
size_t exponent_start_idx = i + 1;
prob_float = true;
// Strip out plus sign
if (in[i + 1] == '+') {
exponent_start_idx++;
}
return _process_potential_exponential(
in.substr(exponent_start_idx),
is_negative ? -(integral_part + decimal_part) : integral_part + decimal_part,
out
);
}
return DataType::CSV_STRING;
break;
default:
short digit = static_cast<short>(current - '0');
if (digit >= 0 && digit <= 9) {
// Process digit
has_digit = true;
if (!digit_allowed)
return DataType::CSV_STRING;
else if (ws_allowed) // Ex: '510 456'
ws_allowed = false;
// Build current number
if (prob_float)
decimal_part += digit / pow10(++places_after_decimal);
else
integral_part = (integral_part * 10) + digit;
}
// case decimalSymbol: not allowed because decimalSymbol is not a literal.
else if (dot_allowed && current == decimalSymbol) {
dot_allowed = false;
prob_float = true;
}
else {
return DataType::CSV_STRING;
}
}
}
// No non-numeric/non-whitespace characters found
if (has_digit) {
long double number = integral_part + decimal_part;
if (out) {
*out = is_negative ? -number : number;
}
return prob_float ? DataType::CSV_DOUBLE : _determine_integral_type(number);
}
// Just whitespace
return DataType::CSV_NULL;
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,81 @@
[
{
"Id": "graphs_csv",
"Name": "Vince's CSV Parser",
"QDocModule": "qtdoc",
"QtUsage": "Used to read a CSV file in the graphs_csv example.",
"QtParts": [ "examples" ],
"Path": "csv-parser",
"Description": "CSV parser with simple, intuitive syntax.",
"Version": "2.3.0",
"DownloadLocation": "https://github.com/vincentlaucsb/csv-parser/archive/refs/tags/2.3.0.zip",
"Homepage": "https://github.com/vincentlaucsb/csv-parser",
"License": "MIT License",
"LicenseId": "MIT",
"Copyright": "Copyright (c) 2017-2019 Vincent La"
},
{
"Id": "graphs_csv_hedley",
"Name": "Hedley",
"QDocModule": "qtdoc",
"QtUsage": "Used to read a CSV file in the graphs_csv example.",
"QtParts": [ "examples" ],
"Path": "csv-parser",
"Files": [ "include/external/hedley.h" ],
"Description": "Hedley is a single C/C++ header you can include in your project to enable compiler-specific features while retaining compatibility with all compilers.",
"Version": "v9",
"DownloadLocation": "https://github.com/nemequ/hedley/releases/tag/v9",
"Homepage": "https://nemequ.github.io/hedley/",
"License": "Creative Commons Zero v1.0 Universal",
"LicenseId": "CC0-1.0",
"Copyright": "Evan Nemerson <evan@nemerson.com>"
},
{
"Id": "graphs_csv_mio",
"Name": "Mio",
"QDocModule": "qtdoc",
"QtUsage": "Used to read a CSV file in the graphs_csv example.",
"QtParts": [ "examples" ],
"Path": "csv-parser",
"Files": [ "include/external/mio.hpp" ],
"Description": "An easy to use header-only cross-platform C++11 memory mapping library",
"Homepage": "https://github.com/vimpunk/mio",
"License": "MIT License",
"LicenseId": "MIT",
"Copyright": "Copyright 2017 https://github.com/mandreyel"
},
{
"Id": "graphs_csv_string_view",
"Name": "string_view lite",
"QDocModule": "qtdoc",
"QtUsage": "Used to read a CSV file in the graphs_csv example.",
"QtParts": [ "examples" ],
"Path": "csv-parser",
"Files": [ "include/external/string_view.hpp" ],
"Description": "A single-file header-only version of a C++17-like string_view for C++98, C++11 and later",
"Homepage": "https://github.com/martinmoene/string-view-lite",
"License": "Boost Software License 1.0",
"LicenseId": "BSL-1.0",
"Copyright": "Copyright 2017-2019 by Martin Moene"
},
{
"Id": "graphs_csv_row_json",
"Name": "JSON for Modern C++",
"QDocModule": "qtdoc",
"QtUsage": "Used to read a CSV file in the graphs_csv example.",
"QtParts": [ "examples" ],
"Path": "csv-parser",
"Files": [ "include/internal/csv_row_json.cpp" ],
"Homepage": "https://json.nlohmann.me/",
"License": "MIT License",
"LicenseId": "MIT",
"Copyright": "Copyright © 2013-2015 Niels Lohmann."
}
]

View File

@ -0,0 +1,64 @@
cmake_minimum_required(VERSION 3.16)
project(qtgraphscsv VERSION 0.1 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
add_subdirectory(3rdparty)
find_package(Qt6 6.8 REQUIRED COMPONENTS Quick Graphs)
qt_standard_project_setup(REQUIRES 6.8)
qt_add_executable(appqtgraphscsv
main.cpp
)
set_source_files_properties(Units.qml
PROPERTIES QT_QML_SINGLETON_TYPE TRUE)
qt_add_qml_module(appqtgraphscsv
URI qtgraphscsv
VERSION 1.0
QML_FILES
Main.qml
Units.qml
components/CustomTableView.qml
components/Graph.qml
components/LegendItem.qml
components/HorizontalHeaderDelegate.qml
components/VerticalHeaderDelegate.qml
SOURCES
datamodel.cpp datamodel.h
)
qt_add_resources(appqtgraphscsv "data"
PREFIX
"/data"
BASE
"data"
FILES
"data/medals.csv"
)
set_target_properties(appqtgraphscsv PROPERTIES
MACOSX_BUNDLE_GUI_IDENTIFIER com.example.demos.appqtgraphscsv
MACOSX_BUNDLE_BUNDLE_VERSION ${PROJECT_VERSION}
MACOSX_BUNDLE_SHORT_VERSION_STRING ${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}
MACOSX_BUNDLE TRUE
WIN32_EXECUTABLE TRUE
)
target_link_libraries(appqtgraphscsv
PRIVATE
Qt6::Quick
Qt6::Graphs
csv
)
include(GNUInstallDirs)
install(TARGETS appqtgraphscsv
BUNDLE DESTINATION .
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
)

View File

@ -0,0 +1,224 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import QtQuick.Controls
import QtQuick.Controls.Basic
import QtQml.Models
import QtGraphs
import qtgraphscsv
Window {
id: mainWindow
width: 1200
height: 720
visible: true
title: qsTr("QtGraphs with CSV Demo")
Binding {
target: Units
property: "window"
value: mainWindow
}
property var seriesBaseColors: ["#99A500", "#361EAB", "#1F9B5D", "#BC7A19", "#B00F36", "#5A1FAA", "#077947", "#4B523C"]
property var allSeriesColors: ["#99A500", "#361EAB", "#1F9B5D", "#BC7A19", "#B00F36", "#5A1FAA", "#077947", "#4B523C"]
property var activeTheme: themes.lightTheme
Item {
id: themes
property alias lightTheme: lightTheme
property alias graphsLightTheme: lightTheme.graphsTheme
property alias darkTheme: darkTheme
property alias graphsDarkTheme: darkTheme.graphsTheme
Item {
id: lightTheme
property color primaryTextColor: "#2D2D2D"
property color secondaryTextColor: "#595959"
property color backgroundColor: "#FCFCFC"
property color selectionColor: "#E3E3E3"
property color borderColor: "#CDCDCD"
property color legendLabelTextColor: "#FCFCFC"
property alias graphsTheme: graphsLightTheme
GraphsTheme {
id: graphsLightTheme
backgroundColor: "transparent"
colorScheme: GraphsTheme.ColorScheme.Light
labelTextColor: lightTheme.primaryTextColor
}
}
Item {
id: darkTheme
property color primaryTextColor: "#F2F2F2"
property color secondaryTextColor: "#AEAEAE"
property color backgroundColor: "#1F1F1F"
property color selectionColor: "#353535"
property color borderColor: "#3F3F3F"
property color legendLabelTextColor: "#FCFCFC"
property alias graphsTheme: graphsDarkTheme
GraphsTheme {
id: graphsDarkTheme
backgroundColor: "transparent"
colorScheme: GraphsTheme.ColorScheme.Dark
labelTextColor: darkTheme.primaryTextColor
}
}
Component.onCompleted: {
// Generate more colors for the series
for (var i = 0; i < mainWindow.seriesBaseColors.length; ++i) {
mainWindow.allSeriesColors[i + 8] = Qt.lighter(mainWindow.seriesBaseColors[i]);
mainWindow.allSeriesColors[i + 16] = Qt.darker(mainWindow.seriesBaseColors[i]);
mainWindow.allSeriesColors[i + 24] = Qt.lighter(Qt.lighter(mainWindow.seriesBaseColors[i]));
mainWindow.allSeriesColors[i + 32] = Qt.darker(Qt.darker(mainWindow.seriesBaseColors[i]));
}
graphsDarkTheme.seriesColors = mainWindow.allSeriesColors;
graphsLightTheme.seriesColors = mainWindow.allSeriesColors;
}
}
Rectangle {
id: background
property alias dataModel: dataView.dataModel
anchors.fill: parent
color: activeTheme.backgroundColor
Item {
id: dataView
property alias dataModel: dataModel
anchors.top: background.top
anchors.bottom: background.bottom
anchors.left: background.left
implicitWidth: 400 * Units.px
implicitHeight: 719 * Units.px
CustomTableView {
id: customTableView
implicitHeight: 320 * Units.px
tableViewModel: dataModel
horizontalHeaderViewModel: hHeaderModel
anchors.top: dataView.top
anchors.topMargin: 40
anchors.left: dataView.left
anchors.right: dataView.right
anchors.leftMargin: 24 * Units.px
anchors.bottom: dataView.bottom
anchors.bottomMargin: 294 * Units.px
selectionColor: activeTheme.selectionColor
backgroundColor: activeTheme.backgroundColor
primaryTextColor: activeTheme.primaryTextColor
secondaryTextColor: activeTheme.secondaryTextColor
borderColor: activeTheme.borderColor
title: qsTr("Medals")
Connections {
target: customTableView.dataSelectionModel
function onSelectionChanged(selected, deselected) {
if (customTableView.dataSelectionModel.selectedIndexes.length < 1) {
graphView.graphsItem.clearGraph();
return;
}
const first = customTableView.mainTableView.cellAtIndex(customTableView.dataSelectionModel.selectedIndexes[0]);
const last = customTableView.mainTableView.cellAtIndex(
customTableView.dataSelectionModel.selectedIndexes[
customTableView.dataSelectionModel.selectedIndexes.length - 1]);
graphView.graphsItem.updateModelMapper(first, last);
const firstCategoryIndex = graphView.graphsItem.modelMapper.first;
const lastCategoryIndex = graphView.graphsItem.modelMapper.count;
graphView.graphsItem.categoryAxis.categories = customTableView.extractBarSetGategories(
firstCategoryIndex, lastCategoryIndex);
}
}
}
SelectionRectangle {
id: selectionRectangle
target: customTableView.mainTableView
}
ListModel {
id: hHeaderModel
}
LegendItem {
id: legendItem
anchors.top: customTableView.bottom
anchors.left: dataView.left
anchors.leftMargin: 24 * Units.px
anchors.topMargin: 70 * Units.px
labelTextColor: activeTheme.legendLabelTextColor
titleTextColor: activeTheme.secondaryTextColor
series: graphView.graphsItem.series
}
CsvDataModel {
id: dataModel
onModelReset: () => {
const first = graphView.graphsItem.modelMapper.first;
const count = graphView.graphsItem.modelMapper.count;
graphView.graphsItem.categoryAxis.categories = customTableView.extractBarSetGategories(first, count);
customTableView.fillHorizontalHeaderModel(dataModel.columnCount());
}
Component.onCompleted: () => {
dataModel.readCsv("qrc:/data/medals.csv");
}
}
}
Item {
id: graphView
property alias graphsItem: graphsItem
anchors.left: dataView.right
anchors.right: background.right
anchors.top: background.top
anchors.bottom: background.bottom
width: 800 * Units.px
Graph {
id: graphsItem
anchors.fill: graphView
chartView.marginLeft: 71 * Units.px
chartView.marginRight: 43 * Units.px
chartView.marginTop: 47 * Units.px
chartView.marginBottom: 80 * Units.px
theme: activeTheme.graphsTheme
labelDelegateTextColor: activeTheme.secondaryTextColor
}
}
}
Component.onCompleted: () => {
switch (Application.styleHints.colorScheme) {
case Qt.Light:
activeTheme = themes.lightTheme;
break;
case Qt.Dark:
activeTheme = themes.darkTheme;
break;
}
graphView.graphsItem.modelMapper.model = background.dataModel;
}
}

View File

@ -0,0 +1,17 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma Singleton
import QtQuick
Item {
property var window
readonly property real px: {
if (window === undefined)
return 1;
const win = window;
return Math.max(win.width, win.height) / win.width;
}
}

View File

@ -0,0 +1,184 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import QtQuick.Controls
import QtQuick.Controls.Basic
import qtgraphscsv
Item {
id: tableviewItem
property alias horizontalHeaderview: hHeaderView
property alias verticalalHeaderview: vHeaderView
property alias mainTableView: tv
property alias tableViewModel: tv.model
property alias horizontalHeaderViewModel: hHeaderView.model
property alias title: titleLabel.text
property alias dataSelectionModel: tv.dataSelectionModel
property color borderColor
property color primaryTextColor
property color secondaryTextColor
property color selectionColor: "#E3E3E3"
property color backgroundColor: "#FCFCFC"
property color scrollbarBackgroundColor: "#BEBEBE"
readonly property font titleFont: ({
family: "Inter",
weight: 700 * Units.px,
pixelSize: 16 * Units.px,
letterSpacing: 0 * Units.px,
bold: false
})
function extractBarSetGategories(first, count) {
var categories = [];
const last = first + count;
for (var i = first; i < last; ++i)
categories.push(tv.model.headerData(i, Qt.Horizontal, Qt.DisplayRole) + " medals");
return categories;
}
function fillHorizontalHeaderModel(rowLength) {
hHeaderView.model.clear();
for (var i = 0; i < rowLength; ++i) {
var h = tv.model.headerData(i, Qt.Horizontal, Qt.DisplayRole);
hHeaderView.model.append({
"display": h
});
}
}
Text {
id: titleLabel
text: ""
font: titleFont
color: tableviewItem.primaryTextColor
width: 132 * Units.px
height: 16 * Units.px
anchors.top: tableviewItem.top
anchors.left: vHeaderView.left
anchors.right: tableviewItem.right
anchors.rightMargin: 236 * Units.px
anchors.leftMargin: 8 * Units.px
verticalAlignment: Text.AlignBottom
}
VerticalHeaderView {
id: vHeaderView
implicitWidth: 91 * Units.px
implicitHeight: 320 * Units.px
clip: true
anchors.top: hHeaderView.bottom
anchors.bottom: tv.bottom
boundsBehavior: Flickable.StopAtBounds
syncView: tv
delegate: VerticalHeaderDelegate {
textColor: tableviewItem.primaryTextColor
borderColor: tableviewItem.borderColor
}
}
HorizontalHeaderView {
id: hHeaderView
implicitWidth: 250 * Units.px
anchors.top: titleLabel.bottom
anchors.topMargin: 25 * Units.px
anchors.left: vHeaderView.right
interactive: false
columnWidthProvider: column => {
return (column ? tv.implicitColumnWidth(column) : 0);
}
delegate: HorizontalHeaderDelegate {
textColor: tableviewItem.secondaryTextColor
borderColor: tableviewItem.borderColor
}
}
TableView {
id: tv
property alias dataSelectionModel: dataSelectionModel
anchors.top: hHeaderView.bottom
anchors.left: vHeaderView.right
implicitWidth: 255 * Units.px
implicitHeight: 310 * Units.px
reuseItems: false
clip: true
selectionBehavior: TableView.SelectCells
selectionMode: TableView.ContiguousSelection
interactive: false
columnWidthProvider: column => {
if (column === 0)
return 0;
}
rowHeightProvider: row => {
if (row === 0)
return 0;
}
ScrollBar.vertical: MyScrollBar {
parent: tv.parent
anchors.top: tv.top
anchors.bottom: tv.bottom
anchors.left: tv.right
anchors.leftMargin: 16 * Units.px
}
keyNavigationEnabled: false
selectionModel: ItemSelectionModel {
id: dataSelectionModel
}
delegate: Rectangle {
id: delegateRectangle
implicitHeight: 31 * Units.px
implicitWidth: 63 * Units.px
required property bool selected
required property bool current
required property string display
border.width: 1 * Units.px
border.color: tableviewItem.borderColor
color: selected ? tableviewItem.selectionColor : tableviewItem.backgroundColor
Text {
leftPadding: 38
topPadding: 6
bottomPadding: 5
rightPadding: 11
color: tableviewItem.primaryTextColor
text: delegateRectangle.display
}
}
}
component MyScrollBar: ScrollBar {
id: scrollBar
background: Rectangle {
implicitWidth: 8 * Units.px
implicitHeight: 320 * Units.px
radius: 8 * Units.px
color: tableviewItem.selectionColor
}
contentItem: Rectangle {
implicitWidth: 8 * Units.px
implicitHeight: 91 * Units.px
color: tableviewItem.scrollbarBackgroundColor
radius: 8 * Units.px
}
}
}

View File

@ -0,0 +1,134 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import QtQuick.Effects
import QtGraphs
import qtgraphscsv
Item {
id: graphsItem
property alias series: chartView.barSeries
property alias chartView: chartView
property alias modelMapper: chartView.modelMapper
property alias categoryAxis: chartView.categoryAxis
property alias theme: chartView.theme
property alias labelDelegateTextColor: chartView.labelTextColor
readonly property font graphCategoryFont: ({
"family": "Inter",
"weight": 600 * Units.px,
"pixelSize": 14 * Units.px,
"letterSpacing": 0 * Units.px,
"bold": false
})
function updateModelMapper(first, last) {
chartView.modelMapper.firstBarSetSection = first.y;
chartView.modelMapper.lastBarSetSection = last.y;
chartView.modelMapper.first = first.x;
chartView.modelMapper.count = (last.x - first.x) + 1;
}
function clearGraph() {
series.clear();
categoryAxis.clear();
}
GraphsView {
id: chartView
anchors.fill: graphsItem
property alias modelMapper: barModelMapper
property alias barSeries: barSeries
property alias categoryAxis: categoryAxis
property alias barRadius: barSeries.radius
property alias barBlur: barSeries.blur
property alias opa: barSeries.opa
property alias sizeFactor: barSeries.sizeFactor
property color labelTextColor: "green"
marginLeft: 71 * Units.px
marginRight: 43 * Units.px
marginTop: 47 * Units.px
marginBottom: 80 * Units.px
axisX: BarCategoryAxis {
id: categoryAxis
subGridVisible: false
property color labelDelegateTextColor: chartView.labelTextColor
labelDelegate: Item {
id: labelItem
property string text
property color labelTextColor: categoryAxis.labelDelegateTextColor
Text {
id: labelDelegate
anchors.centerIn: parent
font: graphCategoryFont
text: labelItem.text
color: labelItem.labelTextColor
}
}
}
axisY: ValueAxis {
id: axisY
max: 35
min: 0
subTickCount: 4
tickInterval: 5
}
BarSeries {
id: barSeries
property real radius: 20 * Units.px
property real blur: 15
property real sizeFactor: 0.6
property real opa: 0.4
barDelegate: Item {
id: customBar
property color barColor
RectangularShadow {
id: shadowEffect
anchors.horizontalCenter: barRectangle.horizontalCenter
anchors.bottom: barRectangle.bottom
blur: barSeries.blur
opacity: barSeries.opa
width: barSeries.sizeFactor * barRectangle.width * Units.px
height: 0.9 * barRectangle.height * Units.px
cached: true
}
Rectangle {
id: barRectangle
anchors.fill: parent
color: customBar.barColor
opacity: 0.9
width: 24 * Units.px
topLeftRadius: barSeries.radius
topRightRadius: barSeries.radius
}
}
}
BarModelMapper {
id: barModelMapper
series: barSeries
firstBarSetSection: -1
lastBarSetSection: -1
first: -1
count: -1
orientation: Qt.Horizontal
}
}
}

View File

@ -0,0 +1,42 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import qtgraphscsv
Rectangle {
id: horizontalHeaderViewDelegate
property color textColor
property color borderColor
required property string display
readonly property font horizontalTitleFont: ({
family: "Inter",
weight: 600 * Units.px,
pixelSize: 12 * Units.px,
letterSpacing: 0 * Units.px,
bold: false
})
color: "transparent"
implicitHeight: 31 * Units.px
implicitWidth: 63 * Units.px
Text {
id: txv
anchors.right: horizontalHeaderViewDelegate.right
anchors.rightMargin: 2 * Units.px
anchors.bottom: horizontalHeaderViewDelegate.bottom
anchors.bottomMargin: 10 * Units.px
color: textColor
text: horizontalHeaderViewDelegate.display
font: horizontalHeaderViewDelegate.horizontalTitleFont
horizontalAlignment: Text.AlignRight
verticalAlignment: Text.AlignBottom
elide: Text.ElideRight
}
}

View File

@ -0,0 +1,93 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import QtGraphs
import qtgraphscsv
Item {
id: legendItem
property BarSeries series
property color labelTextColor
property color titleTextColor
readonly property font titleFont: ({
"family": "Inter",
"weight": 600 * Units.px,
"pixelSize": 12 * Units.px,
"letterSpacing": 0 * Units.px,
"bold": false
})
readonly property font labelFont: ({
"family": "Inter",
"weight": 600 * Units.px,
"pixelSize": 10 * Units.px,
"letterSpacing": 0 * Units.px,
"bold": false
})
Column {
id: topLayout
spacing: 15 * Units.px
Text {
id: title
color: titleTextColor
verticalAlignment: Text.AlignVCenter
text: qsTr("Selected")
}
Flickable {
id: selectionFlickable
width: contentWidth
height: 200 * Units.px
contentWidth: selectionLabels.width
contentHeight: selectionLabels.height
clip: true
Flow {
id: selectionLabels
spacing: 12 * Units.px
width: 240 * Units.px
Repeater {
id: labelRepeater
model: legendItem.series ? legendItem.series.legendData.length : 0
Rectangle {
id: legend1
required property int index
height: 20 * Units.px
width: text1.width * Units.px
radius: 4 * Units.px
color: legendItem.series.legendData[index].color
Text {
id: text1
topPadding: 4 * Units.px
bottomPadding: 4 * Units.px
leftPadding: 6 * Units.px
rightPadding: 6 * Units.px
horizontalAlignment: Text.AlignHCenter
color: legendItem.labelTextColor
font: legendItem.labelFont
text: legendItem.series.legendData[index].label
}
}
}
}
}
}
}

View File

@ -0,0 +1,34 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
pragma ComponentBehavior: Bound
import QtQuick
import qtgraphscsv
Rectangle {
id: verticalViewDelegate
required property string display
property color textColor
property color borderColor
color: "transparent"
implicitHeight: 31 * Units.px
implicitWidth: 91 * Units.px
border.color: borderColor
border.width: 1 * Units.px
Text {
id: txv
topPadding: 5 * Units.px
rightPadding: 25 * Units.px
leftPadding: 8 * Units.px
bottomPadding: 6 * Units.px
color: verticalViewDelegate.textColor
text: verticalViewDelegate.display
verticalAlignment: Text.AlignBottom
elide: Text.ElideRight
}
}

View File

@ -0,0 +1,94 @@
Team,Gold,Silver,Bronze,Total
Argentina,0,1,2,3
Armenia,0,2,2,4
Australia,17,7,22,46
Austria,1,1,5,7
Azerbaijan,0,3,4,7
Bahamas,2,0,0,2
Bahrain,0,1,0,1
Belarus,1,3,3,7
Belgium,3,1,3,7
Bermuda,1,0,0,1
Botswana,0,0,1,1
Brazil,7,6,8,21
Bulgaria,3,1,2,6
Burkina Faso,0,0,1,1
Canada,7,6,11,24
China,38,32,18,88
Chinese Taipei,2,4,6,12
Colombia,0,4,1,5
Cote d'Ivoir,0,0,1,1
Croatia,3,3,2,8
Cuba,7,3,5,15
Czech Republic,4,4,3,11
Denmark,3,4,4,11
Dominican Republic,0,3,2,5
Ecuador,2,1,0,3
Egypt,1,1,4,6
Estonia,1,0,1,2
Ethiopia,1,1,2,4
Fiji,1,0,1,2
Finland,0,0,2,2
France,10,12,11,33
Georgia,2,5,1,8
Germany,10,11,16,37
Ghana,0,0,1,1
Great Britain,22,21,22,65
Greece,2,1,1,4
Grenada,0,0,1,1
"Hong Kong, China",1,2,3,6
Hungary,6,7,7,20
India,1,2,4,7
Indonesia,1,1,3,5
Ireland,2,0,2,4
Islamic Republic of Iran,3,2,2,7
Israel,2,0,2,4
Italy,10,10,20,40
Jamaica,4,1,4,9
Japan,27,14,17,58
Jordan,0,1,1,2
Kazakhstan,0,0,8,8
Kenya,4,4,2,10
Kosovo,2,0,0,2
Kuwait,0,0,1,1
Kyrgyzstan,0,2,1,3
Latvia,1,0,1,2
Lithuania,0,1,0,1
Malaysia,0,1,1,2
Mexico,0,0,4,4
Mongolia,0,1,3,4
Morocco,1,0,0,1
Namibia,0,1,0,1
Netherlands,10,12,14,36
New Zealand,7,6,7,20
Nigeria,0,1,1,2
North Macedonia,0,1,0,1
Norway,4,2,2,8
Philippines,1,2,1,4
Poland,4,5,5,14
Portugal,1,1,2,4
Puerto Rico,1,0,0,1
Qatar,2,0,1,3
Republic of Korea,6,4,10,20
Republic of Moldova,0,0,1,1
ROC,20,28,23,71
Romania,1,3,0,4
San Marino,0,1,2,3
Saudi Arabia,0,1,0,1
Serbia,3,1,5,9
Slovakia,1,2,1,4
Slovenia,3,1,1,5
South Africa,1,2,0,3
Spain,3,8,6,17
Sweden,3,6,0,9
Switzerland,3,4,6,13
Syrian Arab Republic,0,0,1,1
Thailand,1,0,1,2
Tunisia,1,1,0,2
Turkey,2,2,9,13
Turkmenistan,0,1,0,1
Uganda,2,1,1,4
Ukraine,1,6,12,19
United States of America,39,41,33,113
Uzbekistan,3,0,2,5
Venezuela,1,3,0,4
1 Team Gold Silver Bronze Total
2 Argentina 0 1 2 3
3 Armenia 0 2 2 4
4 Australia 17 7 22 46
5 Austria 1 1 5 7
6 Azerbaijan 0 3 4 7
7 Bahamas 2 0 0 2
8 Bahrain 0 1 0 1
9 Belarus 1 3 3 7
10 Belgium 3 1 3 7
11 Bermuda 1 0 0 1
12 Botswana 0 0 1 1
13 Brazil 7 6 8 21
14 Bulgaria 3 1 2 6
15 Burkina Faso 0 0 1 1
16 Canada 7 6 11 24
17 China 38 32 18 88
18 Chinese Taipei 2 4 6 12
19 Colombia 0 4 1 5
20 Cote d'Ivoir 0 0 1 1
21 Croatia 3 3 2 8
22 Cuba 7 3 5 15
23 Czech Republic 4 4 3 11
24 Denmark 3 4 4 11
25 Dominican Republic 0 3 2 5
26 Ecuador 2 1 0 3
27 Egypt 1 1 4 6
28 Estonia 1 0 1 2
29 Ethiopia 1 1 2 4
30 Fiji 1 0 1 2
31 Finland 0 0 2 2
32 France 10 12 11 33
33 Georgia 2 5 1 8
34 Germany 10 11 16 37
35 Ghana 0 0 1 1
36 Great Britain 22 21 22 65
37 Greece 2 1 1 4
38 Grenada 0 0 1 1
39 Hong Kong, China 1 2 3 6
40 Hungary 6 7 7 20
41 India 1 2 4 7
42 Indonesia 1 1 3 5
43 Ireland 2 0 2 4
44 Islamic Republic of Iran 3 2 2 7
45 Israel 2 0 2 4
46 Italy 10 10 20 40
47 Jamaica 4 1 4 9
48 Japan 27 14 17 58
49 Jordan 0 1 1 2
50 Kazakhstan 0 0 8 8
51 Kenya 4 4 2 10
52 Kosovo 2 0 0 2
53 Kuwait 0 0 1 1
54 Kyrgyzstan 0 2 1 3
55 Latvia 1 0 1 2
56 Lithuania 0 1 0 1
57 Malaysia 0 1 1 2
58 Mexico 0 0 4 4
59 Mongolia 0 1 3 4
60 Morocco 1 0 0 1
61 Namibia 0 1 0 1
62 Netherlands 10 12 14 36
63 New Zealand 7 6 7 20
64 Nigeria 0 1 1 2
65 North Macedonia 0 1 0 1
66 Norway 4 2 2 8
67 Philippines 1 2 1 4
68 Poland 4 5 5 14
69 Portugal 1 1 2 4
70 Puerto Rico 1 0 0 1
71 Qatar 2 0 1 3
72 Republic of Korea 6 4 10 20
73 Republic of Moldova 0 0 1 1
74 ROC 20 28 23 71
75 Romania 1 3 0 4
76 San Marino 0 1 2 3
77 Saudi Arabia 0 1 0 1
78 Serbia 3 1 5 9
79 Slovakia 1 2 1 4
80 Slovenia 3 1 1 5
81 South Africa 1 2 0 3
82 Spain 3 8 6 17
83 Sweden 3 6 0 9
84 Switzerland 3 4 6 13
85 Syrian Arab Republic 0 0 1 1
86 Thailand 1 0 1 2
87 Tunisia 1 1 0 2
88 Turkey 2 2 9 13
89 Turkmenistan 0 1 0 1
90 Uganda 2 1 1 4
91 Ukraine 1 6 12 19
92 United States of America 39 41 33 113
93 Uzbekistan 3 0 2 5
94 Venezuela 1 3 0 4

View File

@ -0,0 +1,116 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
#include "datamodel.h"
#include "internal/common.hpp"
#include <csv.hpp>
#include <QFileInfo>
#include <QtQml>
#include <qlogging.h>
#include <sstream>
#include <charconv>
#include <string_view>
DataModel::DataModel(QObject *parent) : QAbstractTableModel(parent) { }
DataModel::~DataModel() { }
int DataModel::columnCount(const QModelIndex &parent) const
{
return m_csvData.count() ? m_csvData.at(0).count() : 0;
}
QVariant DataModel::data(const QModelIndex &index, int role) const
{
if (role == Qt::DisplayRole) {
if (index.row() < rowCount() && index.column() < columnCount())
return m_csvData.at(index.row()).at(index.column());
}
return QVariant();
}
QVariant DataModel::headerData(int section, Qt::Orientation orientation, int role) const
{
if (role == Qt::DisplayRole) {
if (orientation == Qt::Horizontal) {
if (section < columnCount())
return m_csvData.at(0).at(section);
}
else if (orientation == Qt::Vertical) {
if (section < rowCount() && columnCount() > 0)
return m_csvData.at(section).at(0);
}
}
return QVariant();
}
QHash<int, QByteArray> DataModel::roleNames() const
{
QHash<int, QByteArray> roles = QAbstractItemModel::roleNames();
roles[CustomRoles::Background] = "background";
return roles;
}
static int tryConvertToInt(std::string_view team, std::string_view fieldName, std::string field)
{
int value = -1;
auto [ptr, ec] = std::from_chars(field.data(), field.data() + field.size(), value);
if (value == -1)
qWarning("%s: error in %s field", team.data(), fieldName.data());
return value;
};
void DataModel::readCsv(const QUrl &csvFile)
{
QAbstractItemModel::beginResetModel();
const auto context = qmlContext(this);
const auto resolvedUrl = context ? context->resolvedUrl(csvFile) : csvFile;
QFile file(QQmlFile::urlToLocalFileOrQrc(resolvedUrl));
if (!file.open(QIODeviceBase::ReadOnly)) {
qWarning("Could not open %s for reading", qUtf8Printable(csvFile.toString()));
return;
}
m_csvData.clear();
std::stringstream ss(file.readAll().toStdString());
csv::CSVReader reader(ss);
auto headers = reader.get_col_names();
QList<QVariant> headersList;
headersList.resize(headers.size());
auto hIt = headersList.begin();
auto it = headers.begin();
while (it != headers.end()) {
*hIt = QString::fromStdString(*it);
++hIt;
++it;
}
if (headersList.count() > 0)
m_csvData.push_back(headersList);
for (const csv::CSVRow &csvRow : reader) {
csv::CSVField teamField = csvRow[headers.at(0)];
csv::CSVField goldField = csvRow[headers.at(1)];
csv::CSVField silverField = csvRow[headers.at(2)];
csv::CSVField bronzeField = csvRow[headers.at(3)];
csv::CSVField totalField = csvRow[headers.at(4)];
QList<QVariant> row;
row.resize(5);
auto team = teamField.get();
row[0] = QString::fromStdString(teamField.get());
row[1] = tryConvertToInt(team, headers.at(1), goldField.get());
row[2] = tryConvertToInt(team, headers.at(2), silverField.get());
row[3] = tryConvertToInt(team, headers.at(3), bronzeField.get());
row[4] = tryConvertToInt(team, headers.at(4), totalField.get());
m_csvData.push_back(row);
}
QAbstractItemModel::endResetModel();
}

View File

@ -0,0 +1,40 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
#ifndef DATAMODEL_H
#define DATAMODEL_H
#include <QAbstractItemModel>
#include <QtQmlIntegration>
#include <QList>
class DataModel : public QAbstractTableModel
{
Q_OBJECT
QML_NAMED_ELEMENT(CsvDataModel)
public:
explicit DataModel(QObject *parent = nullptr);
~DataModel() override;
enum CustomRoles { Background = Qt::UserRole + 1 };
int rowCount(const QModelIndex &parent = QModelIndex()) const override { return m_csvData.count(); }
int columnCount(const QModelIndex &parent = QModelIndex()) const override;
QVariant data(const QModelIndex &index, int role = Qt::DisplayRole) const override;
QVariant headerData(int section, Qt::Orientation orientation,
int role = Qt::DisplayRole) const override;
QHash<int, QByteArray> roleNames() const override;
Qt::ItemFlags flags(const QModelIndex &index) const override
{
return QAbstractItemModel::flags(index);
}
Q_INVOKABLE void readCsv(const QUrl &csvFile);
private:
QList<QList<QVariant>> m_csvData;
QString m_csvFile;
};
#endif

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -0,0 +1,48 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only
/*!
\title Graphs with CSV Data
\examplecategory {Data Visualization}
\ingroup qtquickdemos
\example demos/graphs_csv
\brief How to visualize data from a CSV file in Qt Graphs.
\meta {tag} {demo,quick,graphs}
\meta {docdependencies} {QtGraphs}
\borderedimage qtquick-demo-graphs-csv.png
The \b {Graphs with CSV Data} example shows how to display data from a CSV file in
a 2D bar chart. The application reads the CSV file using a third-party CSV parser.
The data is inserted into a custom model that inherits from
the \l QAbstractTableModel. After the data is inserted into the model, the leftmost
column contains the vertical header data, while the top row contains the horizontal
header data.
The selected third-party library knows nothing about \l{The Qt Resource System}{Qt Resource System},
so it cannot load the CSV file from the file path. Therefore, the source's contents
must be loaded before passing it to the library. The CSV library in this example accepts
the file as either file path, \c {std::fstream}, or \c {std::stringstream}. Since
the CSV file is in Qt resources, the library doesn't know how to load it, and
\l QFile doesn't support \c {std::fstream}; the only option left is to use
\c {std::stringstream}. The whole file can be read into a string via
\l QFile::readAll(), and then that string is opened into
a \c {std::stringstream}.
In the application window, a table view presents the data from the model.
From the table view, a user can select a subsection of data that is
then displayed in the bar graph.
As the bar series doesn't modify category axis labels, updating the series doesn't
update the axis labels. That is handled in JavaScript by extracting label names
from the model's leftmost column. The extracted label names are then set to
the category axis's labels property.
\quotefromfile demos/graphs_csv/components/CustomTableView.qml
\skipto extractBarSetGategories
\printuntil }
\include examples-run.qdocinc
\sa {QML Applications}
*/

View File

@ -0,0 +1,21 @@
// Copyright (C) 2025 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
#include <QGuiApplication>
#include <QQmlApplicationEngine>
int main(int argc, char *argv[])
{
QGuiApplication app(argc, argv);
QQmlApplicationEngine engine;
QObject::connect(
&engine,
&QQmlApplicationEngine::objectCreationFailed,
&app,
[]() { QCoreApplication::exit(-1); },
Qt::QueuedConnection);
engine.loadFromModule("qtgraphscsv", "Main");
return app.exec();
}